Analysis Capabilities and Future Directions

The OMOP GIS extension enables new analytical capabilities that go beyond traditional clinical observational research. This section outlines current analysis infrastructure, future development directions, and emerging needs for transparency and ethical analysis in geospatially-informed health research.


8.1. Current Analysis Infrastructure

8.1.1. Limitations of ATLAS for GIS Use Cases

ATLAS is the primary user interface for OMOP CDM analytics, providing cohort construction, characterization, and prediction capabilities. However, ATLAS currently has limited support for:

  • Spatial exposure variables from the external_exposure table
  • Geographic visualization of exposure patterns and health outcomes
  • Multi-level modeling incorporating area-level social determinants
  • Temporal exposure sequences and longitudinal environmental data
  • Spatial statistics (clustering, autocorrelation, interpolation)

While ATLAS development may eventually incorporate these capabilities, GIS use cases currently require direct interaction with the OHDSI Methods Library (HADES) and custom analytical scripts.

8.1.2. HADES as Primary Analysis Framework

HADES (Health Analytics Data-to-Evidence Suite) is a collection of open-source R packages that provide the analytical foundation for OHDSI research. HADES packages support:

  • Cohort construction (CohortGenerator, Capr)
  • Descriptive statistics (CohortDiagnostics, FeatureExtraction)
  • Population-level estimation (CohortMethod, SelfControlledCaseSeries)
  • Patient-level prediction (PatientLevelPrediction)
  • Data quality assessment (DataQualityDashboard)
  • Federated analytics (Strategus)

For GIS use cases, HADES provides: - Direct access to external_exposure data via SQL queries - Flexible feature engineering from spatial and environmental variables - Robust statistical methods for causal inference and prediction - Network study execution across distributed OMOP sites


8.2. Extending HADES for Geospatial Analysis

Current and proposed use cases (Section 7) are driving the development of HADES extensions to support geospatially-informed health research.

8.2.1. Proposed HADES Packages

GeoFeatureExtraction

Purpose: Extract and engineer features from external_exposure and spatial data

Key Functions: - extractSpatialExposures(): Link person-level clinical data to geospatial exposures - aggregateTemporalExposures(): Create moving window averages, cumulative exposures - calculateSpatialLags(): Compute spatially-lagged exposure metrics for neighborhood effects - imputeMissingExposures(): Handle missing spatial data using kriging or nearest-neighbor methods

Use Cases Driving Development: UC-001 (Air Quality), UC-005 (Residential History)

SpatialCohortMethod

Purpose: Causal inference methods accounting for spatial confounding and clustering

Key Functions: - fitSpatialPropensityModel(): Propensity score matching with spatial covariates - adjustSpatialConfounding(): Restricted spatial regression, spatial+ methods - assessSpatialUnmeasuredConfounding(): Sensitivity analyses for unmeasured spatial factors - estimateSpatialEffects(): Spatial spillover and interaction effects

Use Cases Driving Development: UC-002 (Social Vulnerability), UC-004 (Environmental Justice)

MultiLevelEstimation

Purpose: Multilevel modeling for nested data structures (individuals within neighborhoods)

Key Functions: - fitMultilevelModel(): Mixed effects regression with area-level random effects - partitionVariance(): Intra-class correlation and variance decomposition - crossLevelInteractions(): Test individual × area-level interaction effects - spatialMediation(): Mediation analysis for place-based exposures

Use Cases Driving Development: UC-002 (Social Vulnerability), UC-007 (Food Deserts)

GeoVisualization

Purpose: Privacy-preserving visualization of spatial patterns in health data

Key Functions: - createChoroplethMap(): Thematic maps with small-number suppression - plotSpatialClusters(): Identify and display spatial clusters (SaTScan, DBSCAN) - visualizeExposureGradients(): Continuous surface maps of environmental exposures - animateTemporalMaps(): Time-series visualization of changing spatial patterns

Use Cases Driving Development: UC-004 (Environmental Justice), UC-008 (Climate Change)

8.2.2. Integration Pathway into HADES

Proposed HADES packages developed through GIS use cases will follow the OHDSI contribution process:

  1. Prototype Development: Initial package created in use case-specific repositories
  2. Community Review: Present at OHDSI Working Group meetings and symposia
  3. Validation: Apply to multiple use cases and datasets to demonstrate generalizability
  4. Documentation: Comprehensive vignettes, function reference, validation reports
  5. Testing: Unit tests, integration tests, compliance with HADES standards
  6. Submission: Propose inclusion in HADES suite through OHDSI governance process
  7. Maintenance: Long-term support through OHDSI developer community

Current Status: - GeoFeatureExtraction: Prototype phase (UC-001, UC-002 pilots) - SpatialCohortMethod: Design phase (community feedback solicited) - MultiLevelEstimation: Concept phase (literature review ongoing) - GeoVisualization: Prototype phase (privacy assessment required)


8.3. Analytical Methods for New Research Domains

The GIS extension places three new (or expanded) research domains before OHDSI:

8.3.1. Environmental Epidemiology

Analytical Needs: - Spatiotemporal exposure assessment (linking monitoring data to individual locations over time) - Exposure measurement error modeling (accounting for spatial interpolation uncertainty) - Distributed lag models (delayed and cumulative effects of environmental exposures) - Extreme value analysis (health impacts of rare environmental events like heat waves)

Proposed Analyses: - Case-crossover designs for acute exposure effects (air pollution and MI, heat and ED visits) - Spatial survival analysis for long-term exposure and chronic disease - Bayesian hierarchical models for multi-pollutant exposures - Environmental mixture analysis (quantile g-computation, weighted quantile sum regression)

HADES Gaps: - No current support for distributed lag models - Limited time-series analysis capabilities - No spatial survival analysis methods - Limited mixture analysis tools

8.3.2. Sociodemographic Epidemiology

Analytical Needs: - Multilevel modeling (individuals nested in neighborhoods, neighborhoods in regions) - Structural confounding adjustment (unmeasured community-level factors) - Intersectionality analysis (joint effects of multiple social identities and area characteristics) - Longitudinal neighborhood effects (impact of residential mobility and changing contexts)

Proposed Analyses: - Multilevel propensity score methods for neighborhood interventions - Spatial mediation analysis (how place shapes health through intermediate factors) - G-methods for time-varying area-level confounders - Decomposition methods for disparity analysis (spatial Oaxaca-Blinder)

HADES Gaps: - Limited multilevel modeling support - No spatial mediation tools - No dedicated disparity analysis methods - Limited support for time-varying area-level exposures

8.3.3. Spatial Epidemiology

Analytical Needs: - Spatial cluster detection (identifying geographic areas with elevated disease rates) - Spatial regression (accounting for spatial autocorrelation and boundary effects) - Disease mapping (smoothing rates while preserving privacy) - Spatial interaction modeling (understanding geographic access and utilization patterns)

Proposed Analyses: - Bayesian disease mapping (BYM models, spatial smoothing) - Spatial scan statistics (SaTScan integration) - Geographically weighted regression (local spatial relationships) - Spatial accessibility analysis (two-step floating catchment area methods)

HADES Gaps: - No spatial autocorrelation diagnostics - No spatial regression models - Limited small-area estimation tools - No accessibility analysis methods


8.4. Transparency and Ethical Analysis

Geospatial health research raises unique ethical challenges that require enhanced transparency across the analytical toolchain.

8.4.1. Privacy and Re-identification Risk

Challenges: - Small geographic areas (Census tracts) may enable re-identification of rare conditions - Spatial patterns may reveal protected health information even with aggregation - Federated analysis must protect sensitive location data while enabling spatial statistics

Current Toolchain Transparency: - OMOP CDM provides data structure documentation - HADES packages document statistical methods - However, privacy risk assessment is typically left to individual researchers

Proposed Enhancements: - Automated small-number suppression in geospatial outputs - Differential privacy mechanisms for spatial aggregation - Re-identification risk assessment tools for geographic data - Spatial k-anonymity and l-diversity metrics

8.4.2. Environmental Justice and Equity

Challenges: - Spatial analyses may inadvertently reinforce environmental injustice narratives - Unequal data quality across communities (more monitoring in affluent areas) - Analytical choices (e.g., boundary definitions) can affect equity assessments

Current Toolchain Transparency: - Limited guidance on equity-focused analysis - No standard metrics for environmental justice assessment - Insufficient documentation of potential biases in geospatial data sources

Proposed Enhancements: - Environmental justice impact checklists for spatial analyses - Standardized EJ metrics (cumulative burden, disparate impact ratios) - Data quality stratification reporting by neighborhood characteristics - Community-engaged validation of spatial analyses

8.4.3. Structural Racism and Social Determinants

Challenges: - Area-level social determinants are consequences of historical and structural racism - Analyses must avoid causal language that blames communities for health disparities - Confounding by unmeasured structural factors is pervasive

Current Toolchain Transparency: - OMOP vocabularies document SDOH concepts but not their structural origins - HADES methods assume exchangeability that may not hold for structural exposures - Limited guidance on interpreting area-level effect estimates

Proposed Enhancements: - Structural causal models for social determinants - Sensitivity analyses for unmeasured structural confounding - Explicit documentation of causal assumptions in spatial analyses - Anti-racist data science frameworks integrated into analytical workflows


8.5. Role of AI and LLMs in Transparency

Large Language Models (LLMs) and AI tools may facilitate transparency and ethical analysis through:

8.5.1. Automated Documentation

  • Analysis provenance tracking: LLMs generate human-readable summaries of analytical pipelines
  • Assumption documentation: Extract and document causal assumptions from code and comments
  • Methods translation: Convert technical HADES code into plain-language descriptions for community stakeholders

8.5.2. Bias Detection and Mitigation

  • Data quality assessment: LLMs identify patterns of missing or poor-quality data across communities
  • Equity auditing: Automated review of analytical choices for potential bias introduction
  • Sensitivity analysis generation: LLM-assisted creation of robustness checks for equity-critical analyses

8.5.3. Stakeholder Communication

  • Accessible reporting: Generate community-facing summaries of complex geospatial analyses
  • Interactive Q&A: LLM interfaces allow non-technical stakeholders to query analytical findings
  • Visualizations: LLM-assisted creation of privacy-preserving, equity-focused visualizations

8.5.4. Responsible AI Considerations

Risks: - LLM hallucinations may introduce errors in analytical documentation - Automated bias detection may miss context-specific equity issues - Over-reliance on AI may reduce human engagement in ethical deliberation

Safeguards: - Human review of all LLM-generated documentation - Community validation of AI-assisted equity assessments - Transparency about LLM use in analytical workflows - Regular auditing of AI tools for algorithmic bias


8.6. Future Directions

The analytical capabilities of the OMOP GIS extension will continue to evolve through:

8.6.1. Community-Driven Development

  • Use cases (Section 7) identify analytical needs
  • HADES package proposals emerge from validated use cases
  • Community review ensures methods meet scientific and ethical standards

8.6.2. Integration with Emerging Methods

  • Causal inference for spatial exposures (spatial instrumental variables, regression discontinuity)
  • Machine learning for exposure prediction (satellite imagery, land use data)
  • Agent-based modeling for neighborhood effects
  • Precision environmental health (personalized exposure assessment)

8.6.3. Cross-Disciplinary Collaboration

  • Environmental science (exposure modeling, measurement science)
  • Geography and GIS (spatial statistics, cartography)
  • Urban planning (built environment, accessibility analysis)
  • Social epidemiology (structural determinants, intersectionality)
  • Ethics and philosophy (environmental justice, data justice)

8.6.4. Open Science and Reproducibility

  • All HADES extensions will be open source (Apache 2.0 license)
  • Example analyses documented in vignettes with public data
  • Validation studies published in peer-reviewed literature
  • Community feedback loops through GitHub, forums, and working group meetings

8.7. Resources

Proposed GIS Analysis Extensions

  • GIS Working Group GitHub
  • Use case-specific repositories (linked from Section 7)
  • Method development discussions in WG meetings

Ethical Frameworks

  • OHDSI Code of Conduct
  • FAIR Principles for research data
  • NIH NOT-OD-23-053: Structural Racism and Discrimination
  • EPA EJ 2020 Action Agenda

The GIS Working Group is committed to developing analytical infrastructure that is not only scientifically rigorous but also ethically grounded, transparent, and responsive to community needs.