OHDSI GIS
WGGaia is an integrated toolchain for combining geospatial data with OMOP Common Data Model (CDM) clinical data to enable population health research. Developed by the OHDSI GIS Working Group, Gaia provides infrastructure, software, standards, and workflows for integrating place-based datasets into patient-based health databases.
The name “Gaia” refers to the Greek personification of Earth, reflecting the toolchain’s focus on place-based health determinants and the spatial context of human health.
Improve the health of populations by generating reliable evidence from integrated geospatial and person-level health data.
Gaia currently supports:
Current limitations being addressed:
Gaia consists of multiple interconnected components:
gaiaDb - PostgreSQL/PostGIS database - Core data repository with all geospatial processing logic - SQL routines and PostGIS functions for spatial operations - OMOP CDM integration and exposure calculations - LinkML/JSON-LD metadata support - Transformation recipe library
gaiaCore - Multi-language connector framework - RESTful API access via PostgREST - Direct database connection support - Language-specific client libraries (R, Python) - Orchestrates functions defined in gaiaDb - Zero data processing logic (access layer only)
gaiaCatalog - Metadata catalog - Functional metadata for publicly-hosted geospatial datasets - Schema.org-compliant dataset descriptions - Automated retrieval, extraction, transformation, loading (ETL) instructions - Federated metadata sharing - Version tracking for data sources
gaiaDocker - Deployment orchestration - Coordinated image builds for entire Gaia stack - Versioned releases with docker-compose profiles - Integration with OHDSI Broadsea ecosystem - Official deployment method for all environments
OMOP GIS Vocabulary Package - Custom vocabularies - OMOP GIS Vocabulary (geographic entities, spatial relationships) - OMOP Exposome Vocabulary (environmental toxins, pollutants) - OMOP SDoH Vocabulary (social determinants from ADI, AHRQ, COI, EJI, SEDH, SVI, SDG, SDOHO) - Developed via Custom Vocabulary Builder (CVB) - Delta files available in TuftsCTSI/CVB
Gaia provides a standardized, automated, reproducible, and shareable means for integrating place-based datasets into longitudinal patient health databases.
Key capabilities: - Harmonize disparate geospatial datasets into common format - Link patient locations to environmental and social exposures - Calculate spatiotemporal exposure metrics while preserving privacy - Integrate exposures into OMOP CDM for analytics - Enable federated network studies with reproducible workflows
A researcher deploys gaiaDocker locally and gains immediate access to curated geospatial data sources:
docker compose --profile gaia up -d
They can now: - Load harmonized datasets across multiple domains (environment, demographics, SDoH) - Perform spatial joins and exploratory analyses - Create visualizations and geospatial applications - Access data via API or direct database connection
Example: Link air quality monitoring data (EPA) with social vulnerability index (CDC) to study environmental justice.
A researcher with an OMOP CDM database integrates geospatial exposures:
All processing happens in gaiaDb with privacy preservation: - Only aggregated geographic identifiers (census block, ZIP) leave site - Exposure values contain no reverse-geocoding information - Standard OMOP privacy protections apply
Example: Calculate neighborhood deprivation index exposure during critical developmental windows for pediatric asthma cohort.
Gaia enables reproducible workflows across OHDSI data networks:
All steps include detailed provenance metadata and transformation documentation.
Example: Multi-site study examining association between greenspace exposure and mental health outcomes across diverse urban environments.
The Gaia Framework consists of three main components working together:
gaiaCatalog provides data discovery and metadata management:
Data flow: External data sources → Catalog metadata → Automated ingestion
gaiaDb provides storage and processing:
gaiaCore provides multi-language access:
Data flow: Raw geospatial data → gaiaDb processing → gaiaCore access → Client applications
Specialized functionality beyond core CRUD operations:
Extensions interface with gaiaCore to leverage gaiaDb functionality.
git clone https://github.com/OHDSI/gaiaDocker.git && cd gaiaDocker
docker compose --profile gaia up -d
See Get Started for detailed deployment and onboarding instructions.