AlgorithmsThis page documents some canonical algorithms used to compute distance from care metrics within the Gaia toolchain. These metrics quantify how far a patient resides from a healthcare facility and how long it takes to travel there — key inputs for access-to-care, health equity, and spatial epidemiology analyses.
Two metrics are defined:
| Metric | Definition |
|---|---|
| Travel Distance | Straight-line (geodesic) distance in kilometers between a patient’s geocoded residence and the nearest qualifying care facility |
| Travel Time | Estimated travel time in minutes, derived from travel distance using a mode-specific speed model |
The algorithm requires the following OMOP CDM / Gaia tables to be populated:
| Table | Required Fields | Notes |
|---|---|---|
LOCATION |
location_id, latitude,
longitude |
Both coordinates must be non-NULL |
PERSON |
person_id, location_id |
Current residence linkage |
LOCATION_HISTORY |
entity_id, location_id,
start_date, end_date,
relationship_type_concept_id |
Optional; enables point-in-time lookups (concept 581476 = “Lives at”) |
CARE_SITE |
care_site_id,
location_id |
Facility registry; requires geocoded
LOCATION rows |
Sufficient location data is defined as a
LOCATIONrecord for the patient (viaPERSON.location_idor activeLOCATION_HISTORYrow) with bothlatitudeandlongitudepopulated and within plausible geographic bounds (latitude −90 to 90, longitude −180 to 180).
The Haversine formula computes the great-circle distance between two points on a sphere given their latitudes and longitudes. It is accurate to within ~0.5% for distances under 500 km and requires no PostGIS extension, making it portable across all supported databases (PostgreSQL, SQL Server, BigQuery, Oracle).
Given patient location (φ₁, λ₁) and facility location (φ₂, λ₂) in decimal degrees:
Δφ = φ₂ − φ₁ (difference in latitude, radians)
Δλ = λ₂ − λ₁ (difference in longitude, radians)
a = sin²(Δφ/2) + cos(φ₁) · cos(φ₂) · sin²(Δλ/2)
c = 2 · atan2(√a, √(1−a))
d = R · c (R = 6371 km, Earth mean radius)
The function below is designed to operate entirely within the
omopgis schema and is compatible with the synthetic dataset
structure.
-- ============================================================
-- Function: calculate_haversine_distance_km
-- Returns geodesic distance in kilometers between two lat/lon
-- coordinate pairs using the Haversine formula.
-- ============================================================
CREATE OR REPLACE FUNCTION omopgis.calculate_haversine_distance_km(
lat1 DOUBLE PRECISION,
lon1 DOUBLE PRECISION,
lat2 DOUBLE PRECISION,
lon2 DOUBLE PRECISION
)
RETURNS DOUBLE PRECISION
LANGUAGE SQL
IMMUTABLE STRICT
AS $$
SELECT
6371.0 * 2.0 * ATAN2(
SQRT(
POWER(SIN(RADIANS(lat2 - lat1) / 2.0), 2)
+ COS(RADIANS(lat1))
* COS(RADIANS(lat2))
* POWER(SIN(RADIANS(lon2 - lon1) / 2.0), 2)
),
SQRT(1.0 - (
POWER(SIN(RADIANS(lat2 - lat1) / 2.0), 2)
+ COS(RADIANS(lat1))
* COS(RADIANS(lat2))
* POWER(SIN(RADIANS(lon2 - lon1) / 2.0), 2)
))
)
$$;
When PostGIS is available (default gaiaDb deployment),
ST_DistanceSphere provides equivalent accuracy with better
performance on large datasets via spatial indexing:
-- PostGIS equivalent — requires geography cast
SELECT ST_DistanceSphere(
ST_MakePoint(lon1, lat1), -- (longitude first in PostGIS)
ST_MakePoint(lon2, lat2)
) / 1000.0 AS distance_km; -- convert meters → km
Travel time is estimated from geodesic distance using a piecewise linear speed model stratified by assumed travel mode. This approach does not require a routing engine (e.g., OSRM, Google Maps) and is reproducible across federated sites.
| Travel Mode | Assumed Speed | Rationale |
|---|---|---|
| Driving (urban, < 5 km) | 25 km/h | Urban traffic, parking |
| Driving (suburban, 5–30 km) | 55 km/h | Mixed roads |
| Driving (rural, > 30 km) | 85 km/h | Highway-dominated |
| Walking (< 3 km) | 4.8 km/h | Average walking speed |
| Public transit | 20 km/h | Door-to-door average |
These defaults reflect national US averages and should be calibrated to local geographies when site-specific data are available. Researchers may substitute a routing API estimate stored in
EXTERNAL_EXPOSURE(using an appropriate vocabulary concept) in place of this model.
-- ============================================================
-- Function: estimate_travel_time_minutes
-- Returns estimated driving travel time in minutes given a
-- geodesic distance in kilometers, using a piecewise speed
-- model appropriate for US geographies.
-- ============================================================
CREATE OR REPLACE FUNCTION omopgis.estimate_travel_time_minutes(
distance_km DOUBLE PRECISION
)
RETURNS DOUBLE PRECISION
LANGUAGE SQL
IMMUTABLE STRICT
AS $$
SELECT CASE
WHEN distance_km < 5.0 THEN (distance_km / 25.0) * 60.0
WHEN distance_km < 30.0 THEN (distance_km / 55.0) * 60.0
ELSE (distance_km / 85.0) * 60.0
END
$$;
The routine:
LOCATION record containing valid latitude and
longitude.LOCATION_HISTORY record (concept 581476, “Lives at”)
at the index date; falling back to PERSON.location_id.CARE_SITE records with geocoded LOCATION
rows.EXTERNAL_EXPOSURE table using the standard Gaia vocabulary
concepts.-- ============================================================
-- Routine: compute_distance_from_care
-- Computes nearest-facility travel distance and estimated
-- travel time for all patients with sufficient location data.
-- Results are inserted into omopgis.EXTERNAL_EXPOSURE.
--
-- Parameters (set as session variables before calling):
-- index_date DATE — reference date for active
-- LOCATION_HISTORY lookup
-- distance_concept BIGINT — OMOP concept_id for
-- "Distance to nearest care site"
-- time_concept BIGINT — OMOP concept_id for
-- "Estimated travel time to care"
-- km_unit_concept BIGINT — OMOP concept_id for kilometers
-- min_unit_concept BIGINT — OMOP concept_id for minutes
-- ============================================================
DO $$
DECLARE
v_index_date DATE := CURRENT_DATE;
v_distance_cid BIGINT := 2000000100; -- placeholder concept
v_time_cid BIGINT := 2000000101; -- placeholder concept
v_km_unit BIGINT := 8582; -- OMOP: kilometer
v_min_unit BIGINT := 8550; -- OMOP: minute
BEGIN
-- ── Step 1: Resolve patient locations ─────────────────────────────────────────
-- Use LOCATION_HISTORY "Lives at" record active on index_date when available;
-- fall back to PERSON.location_id.
CREATE TEMP TABLE IF NOT EXISTS _patient_locations AS
WITH history_loc AS (
SELECT
lh.entity_id AS person_id,
lh.location_id,
ROW_NUMBER() OVER (
PARTITION BY lh.entity_id
ORDER BY lh.start_date DESC
) AS rn
FROM omopgis.LOCATION_HISTORY lh
JOIN omopgis.LOCATION l ON l.location_id = lh.location_id
WHERE lh.domain_id = 'Person'
AND lh.relationship_type_concept_id = 581476 -- "Lives at"
AND lh.start_date <= v_index_date
AND (lh.end_date IS NULL OR lh.end_date >= v_index_date)
AND l.latitude IS NOT NULL
AND l.longitude IS NOT NULL
AND l.latitude BETWEEN -90 AND 90
AND l.longitude BETWEEN -180 AND 180
),
person_loc AS (
SELECT
p.person_id,
p.location_id
FROM omopgis.PERSON p
JOIN omopgis.LOCATION l ON l.location_id = p.location_id
WHERE l.latitude IS NOT NULL
AND l.longitude IS NOT NULL
AND l.latitude BETWEEN -90 AND 90
AND l.longitude BETWEEN -180 AND 180
)
SELECT
COALESCE(hl.person_id, pl.person_id) AS person_id,
COALESCE(hl.location_id, pl.location_id) AS location_id
FROM person_loc pl
LEFT JOIN (SELECT * FROM history_loc WHERE rn = 1) hl
ON hl.person_id = pl.person_id;
-- ── Step 2: Build care facility coordinate table ───────────────────────────────
CREATE TEMP TABLE IF NOT EXISTS _facility_locations AS
SELECT
cs.care_site_id,
l.location_id,
l.latitude,
l.longitude
FROM omopgis.CARE_SITE cs
JOIN omopgis.LOCATION l ON l.location_id = cs.location_id
WHERE l.latitude IS NOT NULL
AND l.longitude IS NOT NULL
AND l.latitude BETWEEN -90 AND 90
AND l.longitude BETWEEN -180 AND 180;
-- ── Step 3: Compute nearest facility per patient ───────────────────────────────
CREATE TEMP TABLE IF NOT EXISTS _nearest_facility AS
WITH distances AS (
SELECT
pl.person_id,
fl.care_site_id,
omopgis.calculate_haversine_distance_km(
lp.latitude, lp.longitude,
fl.latitude, fl.longitude
) AS distance_km
FROM _patient_locations pl
JOIN omopgis.LOCATION lp ON lp.location_id = pl.location_id
CROSS JOIN _facility_locations fl
),
ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY person_id
ORDER BY distance_km ASC
) AS rn
FROM distances
)
SELECT
person_id,
care_site_id,
distance_km,
omopgis.estimate_travel_time_minutes(distance_km) AS travel_time_minutes
FROM ranked
WHERE rn = 1;
-- ── Step 4: Write distance results to EXTERNAL_EXPOSURE ──────────────────────
INSERT INTO omopgis.EXTERNAL_EXPOSURE (
external_exposure_id,
person_id,
location_id,
exposure_concept_id,
exposure_start_date,
exposure_end_date,
value_as_number,
unit_concept_id,
exposure_relationship_concept_id
)
SELECT
nextval('omopgis.external_exposure_id_seq'), -- adjust to site sequence
nf.person_id,
pl.location_id,
v_distance_cid,
v_index_date,
v_index_date,
nf.distance_km,
v_km_unit,
44818800 -- SNOMED: "Exposed to"
FROM _nearest_facility nf
JOIN _patient_locations pl ON pl.person_id = nf.person_id;
-- ── Step 5: Write travel time results to EXTERNAL_EXPOSURE ───────────────────
INSERT INTO omopgis.EXTERNAL_EXPOSURE (
external_exposure_id,
person_id,
location_id,
exposure_concept_id,
exposure_start_date,
exposure_end_date,
value_as_number,
unit_concept_id,
exposure_relationship_concept_id
)
SELECT
nextval('omopgis.external_exposure_id_seq'),
nf.person_id,
pl.location_id,
v_time_cid,
v_index_date,
v_index_date,
nf.travel_time_minutes,
v_min_unit,
44818800
FROM _nearest_facility nf
JOIN _patient_locations pl ON pl.person_id = nf.person_id;
-- ── Cleanup ───────────────────────────────────────────────────────────────────
DROP TABLE IF EXISTS _patient_locations;
DROP TABLE IF EXISTS _facility_locations;
DROP TABLE IF EXISTS _nearest_facility;
RAISE NOTICE 'Distance-from-care computation complete.';
END $$;
A patient is considered to have sufficient location data when all of the following conditions are met:
| Criterion | Check |
|---|---|
LOCATION record exists |
PERSON.location_id IS NOT NULL or active
LOCATION_HISTORY row |
| Latitude is populated | latitude IS NOT NULL |
| Longitude is populated | longitude IS NOT NULL |
| Coordinates are in range | latitude BETWEEN -90 AND 90 AND
longitude BETWEEN -180 AND 180 |
| Facility registry available | At least one CARE_SITE with geocoded
LOCATION exists |
Patients failing any criterion are excluded from
results (no row written to EXTERNAL_EXPOSURE) to prevent
silent errors from propagating into downstream analyses.
Results are written to omopgis.EXTERNAL_EXPOSURE using
the following field mapping:
| EXTERNAL_EXPOSURE Field | Distance Record | Travel Time Record |
|---|---|---|
person_id |
patient | patient |
location_id |
patient’s resolved location | patient’s resolved location |
exposure_concept_id |
“Distance to nearest care site” | “Estimated travel time to care” |
exposure_start_date |
index date | index date |
exposure_end_date |
index date | index date |
value_as_number |
distance in km | time in minutes |
unit_concept_id |
8582 (kilometer) | 8550 (minute) |
exposure_relationship_concept_id |
44818800 (Exposed to) | 44818800 (Exposed to) |
Vocabulary note: The concept IDs for “Distance to nearest care site” (placeholder 2000000100) and “Estimated travel time to care” (placeholder 2000000101) should be requested from the OMOP GIS Vocabulary. Pending formal registration, sites should use local extension concepts with negative IDs per OMOP convention.
LOCATION_HISTORY) is not yet aggregated into longitudinal
exposure windows.| Extension | Approach |
|---|---|
| Road-network routing | Store OSRM or routing-API estimates in
EXTERNAL_EXPOSURE; use vocabulary concept to flag
source |
| Mode-stratified time | Add separate records per mode (drive, walk, transit)
using distinct exposure_concept_id values |
| Longitudinal distance | Join LOCATION_HISTORY periods to compute
time-weighted mean distance over a study window |
| Facility-type filtering | Restrict _facility_locations by
CARE_SITE.place_of_service_concept_id (e.g., primary care
only) |
| Two-step floating catchment area | Extend to supply-demand ratio analysis for access equity studies |