#Gaia Data Models

This is the specification document for the gaiaDB Data Model. This data model is still under development and likely to change. Each table is represented with a high-level description and ETL conventions that should be followed. This is continued with a discussion of each field in each table, any conventions related to the field, and constraints that should be followed (like primary key, foreign key, etc). Should you have questions please feel free to visit the forums or the github issue page.

Backbone

data_source

Table Description

This table contains records that catalog external (or local) web-hosted entities. All source data in gaiaDB must be referenced in this table.

User Guide

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
data_source_uuid int4 Yes Yes No
org_id varchar(100) Yes No No
org_set_id varchar(100) Yes No No
dataset_name varchar(100) Yes No No
dataset_version varchar(100) Yes No No
geom_type varchar(100) No No No
geom_spec text No No No
boundary_type varchar(100) No No No
has_attributes int4 No No No
download_method varchar(100) Yes No No
download_subtype varchar(100) Yes No No
download_data_standard varchar(100) Yes No No
download_filename varchar(100) Yes No No
download_url varchar(100) Yes No No
download_auth varchar(100) No No No
documentation_url varchar(100) No No No

variable_source

Table Description

This table contains records that describe the distinct variables in a data source enabling downstream data integrations. All variables from attribute source data must be catalogued in this table.

User Guide

NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
variable_source_id serial4 Yes Yes No
geom_dependency_uuid int4 No No Yes data_source
variable_name varchar Yes No No
variable_desc text Yes No No
data_source_uuid int4 Yes No Yes data_source
attr_spec text Yes No No

attr_index

Table Description

A programmatically derived index table of all the attribute source datasets included in the data_source table.

User Guide

NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
attr_index_id numeric Yes Yes No
variable_source_id numeric Yes No Yes variable_source
attr_of_geom_index_id numeric Yes No Yes geom_index
database_schema varchar(255) Yes No No
table_name varchar(255) Yes No No
data_source_id numeric Yes No Yes data_source

geom_index

Table Description

A programmatically derived index table of all the geometry source datasets included in the data_source table.

User Guide

NA NA NA NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
geom_index_id numeric Yes Yes No
data_type_id numeric No No No
data_type_name varchar(255) Yes No No
geom_type_concept_id numeric No No Yes concept
geom_type_source_value varchar(255) No No No
database_schema varchar(255) Yes No No
table_name varchar(255) Yes No No
table_desc varchar(255) Yes No No
data_source_id numeric Yes No Yes data_source

attr_template

Table Description

This table is a template for the standardized attribute table that get created.

User Guide

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
attr_record_id serial4 Yes Yes No
geom_record_id int4 Yes No Yes geom_template
variable_source_record_id int4 Yes No Yes variable_source
attr_concept_id int4 No No Yes concept
attr_start_date date Yes No No
attr_end_date date Yes No No
value_as_number float8 No No No
value_as_string varchar No No No
value_as_concept_id int4 No No Yes concept
unit_concept_id int4 No No Yes concept
unit_source_value varchar No No No
qualifier_concept_id int4 No No Yes concept
qualifier_source_value varchar No No No
attr_source_concept_id int4 No No Yes concept
attr_source_value varchar Yes No No
value_source_value varchar Yes No No

geom_template

Table Description

This table is a template for the standardized geometry tables that get created.

User Guide

NA NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
geom_record_id serial4 Yes Yes No
geom_name varchar Yes No No
geom_source_coding varchar Yes No No
geom_source_value varchar Yes No No
geom_wgs84 geometry No No No
geom_local_epsg int4 Yes No No
geom_local_value geometry Yes No No

OMOP

geom_omop_location

Table Description

This table contains identifier and text address from OMOP Location table records along with their associated geocoded, point geometry.

User Guide

NA NA NA

ETL Conventions

NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
location_id integer Yes Yes No
address varchar(255) Yes No No
geometry geometry Yes No No

omop_location_history

Table Description

This table is a copy of the OMOP Location_History table.

User Guide

NA NA NA NA NA NA

ETL Conventions

NA NA NA NA NA NA

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
location_id integer Yes No Yes location
relationship_type_concept_id integer Yes No Yes concept
domain_id integer Yes No No
entity_id integer Yes No No
start_date date Yes No No
end_date date No No No

CDM Extension

exposure_occurrence

Table Description

This CDM Extension table is used to represent person-related transformations of data from Gaia and to interface with ATLAS and OHDSI tool stack from an OMOP CDM database.

User Guide

The unique key given to a social or environmental exposure for a Person The LOCATION_ID of the Person for whom the exposure is associated. The PERSON_ID of the Person for whom the exposure is associated. The EXPOSURE_CONCEPT_ID field is recommended for primary use in analyses, and must be used for network studies. This is the standard concept mapped from the source value which represents a exposure. Use this date to determine the start date of the exposure. NA Use this date to determine the end date of the exposure. NA This field identifies the origin of the exposure record (e.g. Census, EHR, Environmental data, Geospatial data, Satellite imagery, GIS mapping, Sensor network, Mobile device geolocation, LiDAR) This field can be used to determine the spatiotemporal relationship between the source Exposure and the Person This field can be used to determine the original source of place-based exposure data This field houses the verbatim name of the original source of place-based exposure data. NA NA NA NA The meaning of Concept?4172703?for ?=? is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it?s important when devising analyses to not to forget testing for the content of this field for values different from =. This is the numerical value of the Exposure, if available. If the raw data gives a categorial result for exposures those values are captured and mapped to standard concepts in the ?Exposure Value? domain. UNIT_SOURCE_VALUES should be mapped to a Standard Concept in the Unit domain that best represents the unit as given in the source data.

ETL Conventions

Each derived instance of an exposure should be assigned this unique key. NA NA The CONCEPT_ID to which the source exposure is mapped. This mapping should be integrated into the variable_source record and automatically populated in this record. The date range of the exposure should represent the temporal overlap between the place-based exposure data point and the LOCATION_ID’s location_history record. NA The date range of the exposure should represent the temporal overlap between the place-based exposure data point and the LOCATION_ID’s location_history record. NA The CONCEPT_ID to which the exposure’s data source type is mapped. This mapping should be integrated into the data_source record and automatically populated in this record. The CONCEPT_ID to which the relationship between the Exposure and the Person is mapped. This mapping should be automatically populated in this record. The CONCEPT_ID to which the exposure’s data source is mapped. This mapping should be integrated into the data_source record and automatically populated in this record. This name is mapped to a Standard Exposure Source Concept and the original name is stored here for reference. NA NA NA NA NA This value should be integrated into the variable_source record and automatically populated in this record. This mapping should be integrated into the variable_source record and automatically populated in this record. This mapping should be integrated into the variable_source record and automatically populated in this record.

Gaia Field User Guide ETL Conventions Datatype Required Primary Key Foreign Key FK Table FK Domain
exposure_occurrence_id The unique key given to a social or environmental exposure for a Person Each derived instance of an exposure should be assigned this unique key. integer Yes Yes No
location_id The LOCATION_ID of the Person for whom the exposure is associated. integer Yes No Yes location
person_id The PERSON_ID of the Person for whom the exposure is associated. integer Yes No Yes person
exposure_concept_id The EXPOSURE_CONCEPT_ID field is recommended for primary use in analyses, and must be used for network studies. This is the standard concept mapped from the source value which represents a exposure. The CONCEPT_ID to which the source exposure is mapped. This mapping should be integrated into the variable_source record and automatically populated in this record. integer Yes No Yes concept
exposure_start_date Use this date to determine the start date of the exposure. The date range of the exposure should represent the temporal overlap between the place-based exposure data point and the LOCATION_ID’s location_history record. date Yes No No
exposure_start_datetime datetime No No No
exposure_end_date Use this date to determine the end date of the exposure. The date range of the exposure should represent the temporal overlap between the place-based exposure data point and the LOCATION_ID’s location_history record. date Yes No No
exposure_end_datetime datetime No No No
exposure_type_concept_id This field identifies the origin of the exposure record (e.g. Census, EHR, Environmental data, Geospatial data, Satellite imagery, GIS mapping, Sensor network, Mobile device geolocation, LiDAR) The CONCEPT_ID to which the exposure’s data source type is mapped. This mapping should be integrated into the data_source record and automatically populated in this record. integer Yes No Yes concept Type Concept
exposure_relationship_concept_id This field can be used to determine the spatiotemporal relationship between the source Exposure and the Person The CONCEPT_ID to which the relationship between the Exposure and the Person is mapped. This mapping should be automatically populated in this record. integer Yes No Yes concept
exposure_source_concept_id This field can be used to determine the original source of place-based exposure data The CONCEPT_ID to which the exposure’s data source is mapped. This mapping should be integrated into the data_source record and automatically populated in this record. integer No No Yes concept
exposure_source_value This field houses the verbatim name of the original source of place-based exposure data. This name is mapped to a Standard Exposure Source Concept and the original name is stored here for reference. varchar(50) No No No
exposure_relationship_source_value varchar(50) No No No
dose_unit_source_value varchar(50) No No No
quantity integer No No No
modifier_source_value varchar(50) No No No
operator_concept_id The meaning of Concept?4172703?for ?=? is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it?s important when devising analyses to not to forget testing for the content of this field for values different from =. integer No No Yes concept
value_as_number This is the numerical value of the Exposure, if available. This value should be integrated into the variable_source record and automatically populated in this record. float No No No
value_as_concept_id If the raw data gives a categorial result for exposures those values are captured and mapped to standard concepts in the ?Exposure Value? domain. This mapping should be integrated into the variable_source record and automatically populated in this record. integer No No Yes concept
unit_concept_id UNIT_SOURCE_VALUES should be mapped to a Standard Concept in the Unit domain that best represents the unit as given in the source data. This mapping should be integrated into the variable_source record and automatically populated in this record. integer No No Yes concept Unit