OHDSI GIS
WGThis page describes the technical architecture of the Gaia toolchain, including component interactions, data flows, and design principles.
The Gaia toolchain consists of multiple interconnected components that work together to integrate geospatial data with the OMOP Common Data Model.
┌─────────────────────────────────────────────────────────────────┐
│ External Data Sources │
│ (Census, EPA, Weather, Social Services, Geographic Boundaries) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ gaiaCatalog │
│ • Data source discovery │
│ • Metadata management │
│ • Schema.org compliance │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ gaiaDb │
│ • PostgreSQL/PostGIS staging database │
│ • Transformation recipes │
│ • Spatial indexing │
│ • Raw geospatial data storage │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ gaiaCore │
│ • Multi-language connector framework │
│ • PostgREST API access │
│ • Database connection orchestration │
│ • Language-specific client libraries │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OMOP CDM + GIS Extensions │
│ • Location_History table │
│ • External_Exposure table │
│ • Integrated with standard OMOP tables │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ HADES Analytics │
│ • Cohort definition │
│ • Patient-level prediction │
│ • Population-level estimation │
│ • Evidence generation │
└─────────────────────────────────────────────────────────────────┘
Purpose: Data source discovery and metadata management
Technology: Schema.org, JSON-LD
I/O: External data URLs/metadata → Searchable catalog with standardized metadata
Purpose: Core data repository with all geospatial processing logic and OMOP integration
Technology: PostgreSQL 12+, PostGIS 3.0+, PostgREST, LinkML
Processing: SQL routines, PostGIS functions, exposure calculations, privacy-preserving aggregation, temporal alignment
I/O: Raw geospatial data + patient locations → OMOP External_Exposure and Location_History tables
Schema: backbone/ (core), working/ (OMOP tables), data_sources/, transformations/, functions/, api/
Purpose: Multi-language connector framework (access layer only - NO processing logic)
Technology: PostgREST, language-specific connectors (R, Python)
I/O: API/database requests → Routed to gaiaDb → Results in client format
Key: gaiaCore orchestrates access to gaiaDb functions. All processing happens in gaiaDb.
External Source → gaiaCatalog
Input: Dataset URL, documentation
Process: Metadata extraction, cataloging
Output: Cataloged data source with metadata
Cataloged Source → gaiaDb
Input: Raw geospatial files
Process: Data loading, spatial indexing, transformation
Output: Staged spatial tables in PostGIS
Patient Locations → gaiaDb (via gaiaCore)
Input:
- Patient geocoded addresses
- Residence time periods
- Variable specifications
- Location_History records
Process (in gaiaDb):
- Spatial joins (PostGIS)
- Temporal alignment (SQL)
- Aggregation (e.g., average exposure during residence)
- Privacy preservation (no raw addresses exported)
Output:
- External_Exposure records
gaiaDb Output → OMOP CDM
Input: Processed exposure data
Process:
- Map to OMOP vocabulary concepts
- Link to Location table (privacy-preserved)
- Link to Person via Location_History
- Populate External_Exposure table
Output: OMOP CDM with integrated geospatial exposures
OMOP CDM + GIS Extensions → HADES
Input: Integrated clinical + geospatial data
Process: Standard OHDSI analytics
Output: Evidence generation
Gaia enables geospatial analysis without sharing sensitive patient addresses:
Workflow: Raw Address (protected) → Geographic ID (can leave site) → Exposure Values → OMOP External_Exposure
| Input | Format | Output | Format |
|---|---|---|---|
| Dataset URL | String | Catalog entry | JSON-LD |
| Metadata | Schema.org | Searchable index | Database |
| Variables | CSV/JSON | Variable catalog | JSON |
| Input | Format | Output | Format |
|---|---|---|---|
| Shapefiles | .shp | PostGIS tables | SQL |
| GeoJSON | .geojson | Indexed geometries | PostGIS |
| Raster | .tif, .nc | Raster tables | PostGIS |
| CSV with coords | .csv | Point geometries | PostGIS |
| Transformation recipe | SQL | Transformed data | PostGIS |
| Patient locations and residence periods | Geocoded coords | External_Exposure | OMOP table |
| Variable IDs | Integer | Exposure values | Numeric |
| SQL function calls | SQL/PostgREST | Query results | JSON/Tabular |
| Input | Format | Output | Format |
|---|---|---|---|
| API requests | HTTP/JSON | Exposure data | JSON |
| Database queries | SQL | Query results | Tabular |
| Function calls | REST/SQL | Processed data | Client format |
| Connection params | Config | Database access | Connection |
| Input | Format | Output | Format |
|---|---|---|---|
| docker-compose.yaml | YAML | Running containers | Docker services |
| Profile selection | CLI flag | Stack configuration | Deployed services |
| Environment config | .env | Service parameters | Runtime config |
| Component images | Docker | Orchestrated stack | Running Gaia toolchain |
Extension Tables: Location_History (patient-location-time), External_Exposure (place-based exposures)
HADES: Standard packages work with extended CDM, cohort definitions can include exposure criteria
See Schema Extensions for complete DDL.
See Deployment Strategies for deployment options.
Planned: Real-time data streams, ML integration, multi-site federation, enhanced catalog, unified API gateway
Research: Differential privacy, federated learning, spatiotemporal modeling