OHDSI GIS
WGThe OMOP GIS extension enables new analytical capabilities that go beyond traditional clinical observational research. This section outlines current analysis infrastructure, future development directions, and emerging needs for transparency and ethical analysis in geospatially-informed health research.
ATLAS is the primary user interface for OMOP CDM analytics, providing cohort construction, characterization, and prediction capabilities. However, ATLAS currently has limited support for:
external_exposure tableWhile ATLAS development may eventually incorporate these capabilities, GIS use cases currently require direct interaction with the OHDSI Methods Library (HADES) and custom analytical scripts.
HADES (Health Analytics Data-to-Evidence Suite) is a collection of open-source R packages that provide the analytical foundation for OHDSI research. HADES packages support:
For GIS use cases, HADES provides: - Direct access to
external_exposure data via SQL queries - Flexible feature
engineering from spatial and environmental variables - Robust
statistical methods for causal inference and prediction - Network study
execution across distributed OMOP sites
Current and proposed use cases (Section 7) are driving the development of HADES extensions to support geospatially-informed health research.
Purpose: Extract and engineer features from
external_exposure and spatial data
Key Functions: -
extractSpatialExposures(): Link person-level clinical data
to geospatial exposures - aggregateTemporalExposures():
Create moving window averages, cumulative exposures -
calculateSpatialLags(): Compute spatially-lagged exposure
metrics for neighborhood effects -
imputeMissingExposures(): Handle missing spatial data using
kriging or nearest-neighbor methods
Use Cases Driving Development: UC-001 (Air Quality), UC-005 (Residential History)
Purpose: Causal inference methods accounting for spatial confounding and clustering
Key Functions: -
fitSpatialPropensityModel(): Propensity score matching with
spatial covariates - adjustSpatialConfounding(): Restricted
spatial regression, spatial+ methods -
assessSpatialUnmeasuredConfounding(): Sensitivity analyses
for unmeasured spatial factors - estimateSpatialEffects():
Spatial spillover and interaction effects
Use Cases Driving Development: UC-002 (Social Vulnerability), UC-004 (Environmental Justice)
Purpose: Multilevel modeling for nested data structures (individuals within neighborhoods)
Key Functions: - fitMultilevelModel():
Mixed effects regression with area-level random effects -
partitionVariance(): Intra-class correlation and variance
decomposition - crossLevelInteractions(): Test individual ×
area-level interaction effects - spatialMediation():
Mediation analysis for place-based exposures
Use Cases Driving Development: UC-002 (Social Vulnerability), UC-007 (Food Deserts)
Purpose: Privacy-preserving visualization of spatial patterns in health data
Key Functions: - createChoroplethMap():
Thematic maps with small-number suppression -
plotSpatialClusters(): Identify and display spatial
clusters (SaTScan, DBSCAN) - visualizeExposureGradients():
Continuous surface maps of environmental exposures -
animateTemporalMaps(): Time-series visualization of
changing spatial patterns
Use Cases Driving Development: UC-004 (Environmental Justice), UC-008 (Climate Change)
Proposed HADES packages developed through GIS use cases will follow the OHDSI contribution process:
Current Status: - GeoFeatureExtraction: Prototype phase (UC-001, UC-002 pilots) - SpatialCohortMethod: Design phase (community feedback solicited) - MultiLevelEstimation: Concept phase (literature review ongoing) - GeoVisualization: Prototype phase (privacy assessment required)
The GIS extension places three new (or expanded) research domains before OHDSI:
Analytical Needs: - Spatiotemporal exposure assessment (linking monitoring data to individual locations over time) - Exposure measurement error modeling (accounting for spatial interpolation uncertainty) - Distributed lag models (delayed and cumulative effects of environmental exposures) - Extreme value analysis (health impacts of rare environmental events like heat waves)
Proposed Analyses: - Case-crossover designs for acute exposure effects (air pollution and MI, heat and ED visits) - Spatial survival analysis for long-term exposure and chronic disease - Bayesian hierarchical models for multi-pollutant exposures - Environmental mixture analysis (quantile g-computation, weighted quantile sum regression)
HADES Gaps: - No current support for distributed lag models - Limited time-series analysis capabilities - No spatial survival analysis methods - Limited mixture analysis tools
Analytical Needs: - Multilevel modeling (individuals nested in neighborhoods, neighborhoods in regions) - Structural confounding adjustment (unmeasured community-level factors) - Intersectionality analysis (joint effects of multiple social identities and area characteristics) - Longitudinal neighborhood effects (impact of residential mobility and changing contexts)
Proposed Analyses: - Multilevel propensity score methods for neighborhood interventions - Spatial mediation analysis (how place shapes health through intermediate factors) - G-methods for time-varying area-level confounders - Decomposition methods for disparity analysis (spatial Oaxaca-Blinder)
HADES Gaps: - Limited multilevel modeling support - No spatial mediation tools - No dedicated disparity analysis methods - Limited support for time-varying area-level exposures
Analytical Needs: - Spatial cluster detection (identifying geographic areas with elevated disease rates) - Spatial regression (accounting for spatial autocorrelation and boundary effects) - Disease mapping (smoothing rates while preserving privacy) - Spatial interaction modeling (understanding geographic access and utilization patterns)
Proposed Analyses: - Bayesian disease mapping (BYM models, spatial smoothing) - Spatial scan statistics (SaTScan integration) - Geographically weighted regression (local spatial relationships) - Spatial accessibility analysis (two-step floating catchment area methods)
HADES Gaps: - No spatial autocorrelation diagnostics - No spatial regression models - Limited small-area estimation tools - No accessibility analysis methods
Geospatial health research raises unique ethical challenges that require enhanced transparency across the analytical toolchain.
Challenges: - Small geographic areas (Census tracts) may enable re-identification of rare conditions - Spatial patterns may reveal protected health information even with aggregation - Federated analysis must protect sensitive location data while enabling spatial statistics
Current Toolchain Transparency: - OMOP CDM provides data structure documentation - HADES packages document statistical methods - However, privacy risk assessment is typically left to individual researchers
Proposed Enhancements: - Automated small-number suppression in geospatial outputs - Differential privacy mechanisms for spatial aggregation - Re-identification risk assessment tools for geographic data - Spatial k-anonymity and l-diversity metrics
Challenges: - Spatial analyses may inadvertently reinforce environmental injustice narratives - Unequal data quality across communities (more monitoring in affluent areas) - Analytical choices (e.g., boundary definitions) can affect equity assessments
Current Toolchain Transparency: - Limited guidance on equity-focused analysis - No standard metrics for environmental justice assessment - Insufficient documentation of potential biases in geospatial data sources
Proposed Enhancements: - Environmental justice impact checklists for spatial analyses - Standardized EJ metrics (cumulative burden, disparate impact ratios) - Data quality stratification reporting by neighborhood characteristics - Community-engaged validation of spatial analyses
Large Language Models (LLMs) and AI tools may facilitate transparency and ethical analysis through:
Risks: - LLM hallucinations may introduce errors in analytical documentation - Automated bias detection may miss context-specific equity issues - Over-reliance on AI may reduce human engagement in ethical deliberation
Safeguards: - Human review of all LLM-generated documentation - Community validation of AI-assisted equity assessments - Transparency about LLM use in analytical workflows - Regular auditing of AI tools for algorithmic bias
The analytical capabilities of the OMOP GIS extension will continue to evolve through:
The GIS Working Group is committed to developing analytical infrastructure that is not only scientifically rigorous but also ethically grounded, transparent, and responsive to community needs.