Platform Overview

Built for inference.
Designed for action.

EPIAIDEA is a digital epidemiology and population analytics platform where epidemiologic rigor meets AI-scale methods — translating messy, real-world data into evidence that holds up under scrutiny and guides decisions that affect real populations.

EPIAIDEA — Epidemiology, AI, Data, Evidence, Action — is a framework for doing population health science in an era of abundant, imperfect data. The platform integrates traditional epidemiologic theory with modern AI-enabled methods to extract valid, decision-grade evidence from sources that conventional analysis cannot handle at scale.

The core premise is simple but demanding: AI applied to health data is only as good as the epidemiologic thinking behind it. Prediction accuracy means nothing if the model is trained on biased data, applied to a different population, or used to answer a question it was not designed for. Every analytic decision here begins with the question — not the algorithm — and works backward to the data and method that give the most defensible answer.

This work operates across geographies and health domains — from overdose prevention in rural North Dakota to heat resilience infrastructure in the Southwest US, from GLP-1 misinformation surveillance to fibroid care access mapping in California. What ties it together is a consistent commitment to causal reasoning, equity-conscious design, and outputs that are deployable by the institutions making health decisions.

EPIAIDEA is directed by Professor Akshaya Bhagavathula, promoted to Full Professor at age 39 in the field of Epidemiology. The platform reflects a sustained body of methodologic and applied work across digital epidemiology, pharmacovigilance, geospatial AI, legal epidemiology, and global burden of disease research.

Active Implementations

Live surveillance platforms, interactive maps, and policy dashboards currently deployed and maintained.

$49M+

Systemic Risk Quantified

Modeled systemic risk from heat-induced shelter capacity collapse in the Southwest US — Stanford APSO collaboration.

2019–

Continuous Operation

Platforms are monitored for data drift and updated as policy environments and data sources evolve.

88K+

Research Citations

Peer-reviewed publications forming the methodologic foundation underlying platform analytics.

"The gap between knowing and acting in public health is not a data problem. It is an inference problem."

Health systems are drowning in data but starved for evidence. Electronic health records, insurance claims, syndromic surveillance feeds, digital search behavior — all of this exists, but almost none of it is structured in a way that supports valid causal inference without deliberate epidemiologic curation. EPIAIDEA exists to close that gap. Not by generating more data, but by extracting more signal from the data that already exists — and translating it into the kind of evidence that earns a seat at the policy table.

Framework

The Five Pillars

Each letter in EPIAIDEA names a pillar of the analytic framework. Together they define what it means to do this work with both scientific integrity and real-world impact.

Foundation

Epidemiology

Every analysis is structured around epidemiologic principles: confounding control, bias identification, temporality, and population heterogeneity. The question of whether an association is causal — or merely predictive, or artifactual — shapes every methodologic decision from the outset.

Methods: Difference-in-differences, interrupted time series, propensity score methods, directed acyclic graphs, causal forest

Engine

AI & Machine Learning

Scalable ML and NLP pipelines built with epidemiologic awareness from the ground up — not bolted on afterward. Models are interrogated for confounding, calibrated against external validation sets, and structured to be interpretable to the policy and clinical audiences who act on their outputs.

Methods: Gradient boosting, BERT-family NLP, geospatial ML, time-series forecasting, SHAP interpretability

Scope

Digital Data Sources

The platform draws from a deliberately broad data substrate — Google Search trends, social platform signals, EHR records, insurance claims, Medicare/Medicaid administrative data, and syndromic surveillance feeds — each validated for the specific inference task at hand before any modeling begins.

Sources: Google Trends API, CDC WONDER, FARS, CMS claims, OSHPD, ClaimReview schema, NCBI PubMed

Lens

Equity by Design

Health disparities are not discovered at the end of an analysis — they are built into or out of the design from the beginning. Every platform interrogates differential impact by race, geography, income, and structural access before a finding is considered complete or ready for dissemination.

Frameworks: Social Vulnerability Index, Area Deprivation Index, Index of Multiple Deprivation, travel time equity modeling

Output

Evidence to Action

Inference-ready outputs designed for the institutions that make health decisions — not just journal reviewers. This means dashboards that update automatically, maps that communicate access gaps without requiring statistical literacy, and summaries formatted for Medicaid directors, county health officers, and legislative staff.

Formats: Interactive dashboards, GIS maps, policy briefs, surveillance platforms, API-connected live tools

Imperative

Geospatial Intelligence

Where something happens is often as important as what happens. County- and census tract-level spatial analysis surfaces access gaps, structural disparities, and environmental exposures that aggregate national statistics obscure. Geographic precision is what makes an analysis actionable for a local health department with a specific budget question.

Tools: Centroid-based travel time modeling, buffer zone analysis, spatial autocorrelation, Digital Twin frameworks

Methodology

From Signal to Decision

The analytic pipeline is structured to preserve validity at every stage — from raw data ingestion through to the final evidence product that reaches a decision-maker.

01 Signal Identification

Define the population health question. Identify digital, clinical, or administrative signals that can address it with adequate validity and completeness.

→

02 Epidemiologic Curation

Apply bias assessment, confounder identification, and data quality review before any modeling begins. This stage determines what the data can and cannot validly support.

→

03 AI & Analytics

Deploy the appropriate ML or statistical method — chosen by question type and data structure, not by novelty or computational convenience.

→

04 Validation

External validation against known benchmarks. Sensitivity analyses. Calibration checks. Equity review across subpopulations.

→

05 Deployed Evidence

Live platform, policy brief, or decision-support tool — monitored for data drift and updated as conditions, populations, and policy environments change.

Policy Monitoring

Real-time tracking of health metrics, payment parity indices, and spatial-temporal epidemiology at the state and county level. Designed for health departments and Medicaid agencies that need continuous situational awareness — not quarterly reports that are obsolete before they are distributed.

Access Gap Analysis

County-level structural access mapping using geospatial distance metrics, service listing data, and population demand signals. The goal is to make the invisible visible — identifying where care need exists but care supply does not, and quantifying the magnitude of the gap in terms decision-makers can act on.

Targeted Resource Guidance

Matching investment to need by identifying where digital demand signals and structural care access are most mismatched. Evidence that answers the question every health system planner is actually asking: where should the next dollar go to produce the greatest reduction in preventable harm?

How I Work

Principles That Shape Every Project

These are not aspirational values — they are operational constraints that determine what gets built, how it gets validated, and what it takes for a finding to be called evidence.

Question-first, not data-first

Every project begins with a population health question stated in epidemiologic terms — not with an interesting dataset looking for a use case. The question defines data requirements, method selection, and the validity criteria an output must meet before it can be called evidence.

In practice: Before any modeling begins, I write a two-sentence research question and draw a directed acyclic graph. If I cannot specify the causal structure I am assuming, the analysis does not start.

Bias is the default assumption

Digital and administrative data sources are not random samples of any population. Selection bias, measurement error, and differential missingness are the norm, not the exception. Every analysis begins by enumerating the threats to validity — and either addressing them or explicitly acknowledging what the analysis cannot claim.

In practice: Every platform includes a limitations section written for a skeptical peer reviewer, not for a grant agency. If the limitation is serious enough to affect decisions, it appears prominently — not buried in a footnote.

Interpretability is not optional

A model that performs well on a held-out test set but cannot explain its predictions to a county health officer is not ready for deployment. Every output is built to be interrogated — which features matter, which populations drive the prediction, and what the confidence intervals actually mean for decision-making under uncertainty.

In practice: SHAP values, confidence intervals, and explicit uncertainty communication are included in every deployed tool. Decision-makers see the uncertainty, not a false point estimate.

Deployment is part of the research

A finding that exists only in a PDF is not translational research — it is academic output with translational aspirations. I design for deployment from the start: live data connections, automated updates, and formats that reach the audiences who act on evidence rather than just the ones who evaluate it for publication.

In practice: All active implementations are publicly accessible, maintained, and connected to current data sources — not static snapshots from the year of publication.

Pharmacoepidemiology & Drug Safety

Detecting Safety Signals in Real-World Drug Use

Pharmacoepidemiology uses population-level data to study how medicines perform — and where they harm — outside the controlled conditions of a clinical trial. It is one of the most methodologically demanding areas in epidemiology because the data are almost always confounded by indication, and the signals are almost always rare.

Post-market drug safety is a surveillance problem that clinical trials are structurally incapable of solving. Trials are too small, too short, and too selective to detect adverse events that occur in 1 in 1,000 patients, emerge after years of exposure, or concentrate in subpopulations excluded from enrolment. Pharmacovigilance begins where the trial ends — drawing on spontaneous reporting databases, electronic health records, insurance claims, and increasingly, digital patient-reported data.

My work in this domain focuses on disproportionality analysis of spontaneous adverse event reports — specifically the FDA Adverse Event Reporting System (FAERS) — alongside causal modelling frameworks that go beyond association to ask whether a signal is real, how strong it is relative to comparator drugs, and what the plausible mechanism is. The GLP-1 receptor agonist class has been a central focus: a drug class that went from niche diabetes therapy to one of the most prescribed medication classes in history within five years, with a post-market safety profile that is still being characterized.

Disproportionality analysis without causal thinking produces noise. The Reporting Odds Ratio and Proportional Reporting Ratio are screening tools, not evidence — they tell you where to look, not what you have found. My analyses pair signal detection with structured evaluation of confounding by indication, Weber effect bias, notoriety bias, and the plausibility of the proposed mechanism before a signal is characterized as a concern worth communicating.

Disproportionality Analysis (FAERS)

Calculating Reporting Odds Ratios and Proportional Reporting Ratios across drug–event pairs in the FDA Adverse Event Reporting System. Adjusted for concomitant medications, reporter type, and temporal reporting patterns to reduce Weber effect inflation and notoriety bias.

Causal Modelling for Drug Safety

Applying directed acyclic graphs to the pharmacovigilance setting — specifying confounders, mediators, and colliders in spontaneous reporting data where confounding by indication is nearly universal and cannot be ignored.

Active Surveillance in Claims Data

Using Medicare and commercial insurance claims to conduct active drug safety surveillance — cohort studies with new-user designs, active comparator selection, and high-dimensional propensity score adjustment to address channelling bias.

Signal Characterisation & Communication

Translating pharmacovigilance findings into structured safety communications for clinical and regulatory audiences — framing absolute risk, number needed to harm, and clinical context rather than reporting raw disproportionality statistics.

Case Studies

GLP-1 Safety · FAERS

GLP-1 Receptor Agonists & Gastrointestinal Adverse Events

A systematic disproportionality analysis of GI adverse event reports for semaglutide, liraglutide, tirzepatide, and dulaglutide in FAERS — covering nausea, vomiting, gastroparesis, ileus, and aspiration-related events. RORs were calculated with 95% confidence intervals and adjusted for concomitant use of other GI-active agents. The analysis characterized the differential signal strength across the GLP-1 class and identified gastroparesis as a disproportionately underreported signal relative to its clinical documentation rate in EHR data — suggesting systematic under-reporting in spontaneous surveillance.

Methods: ROR, PRR, Bayesian information component (IC), Weber effect adjustment, concomitant medication stratification

Causal Modelling · Drug Safety

Causal Inference in Pharmacovigilance Databases

Spontaneous reporting databases are not designed for causal inference — they are designed for signal detection. This work applies DAG-based causal reasoning to FAERS analyses to distinguish genuine drug–event associations from artifacts of reporting behaviour, notoriety bias, and indication-driven confounding. The structured approach identifies which signals from disproportionality analysis can be elevated to the level of a probable causal relationship and which require active surveillance in claims or EHR data before any clinical communication is warranted.

Methods: Directed acyclic graphs, E-values, Bradford Hill criteria application, signal triage framework

Infodemiology · GLP-1

Misinformation Landscape Around GLP-1 Safety

The GLP-1 safety discourse in public-facing media has diverged significantly from the regulatory and pharmacovigilance evidence base — amplifying rare signals while underreporting common GI tolerability issues that affect treatment adherence. Using the ClaimReview API and structured misinformation surveillance, I mapped the claim typology, source patterns, and temporal velocity of GLP-1 misinformation — particularly around muscle loss, hair thinning, and cardiac risk — providing a baseline for risk communication intervention design.

Methods: ClaimReview API, claim typology framework, misinformation velocity analysis, outlet-level signal mapping

Explore the work or get in touch

Browse active implementations and research outputs, or reach out to discuss collaboration, consulting, or speaking.

View Research → See Implementations Contact

Built for inference.Designed for action.