Getting Started with Gaia

For background information on Gaia, see Design.

For details on the Gaia Data Model, see Data Model.

The quickest way to start using gaiaDB is by building or downloading the docker image and running a container. Alternatively, you can execute the init.sql script against your own PostgreSQL database.

Dockerized gaiaDB

You can build and run gaiaDB as a docker container with the following commands:

git clone https://github.com/OHDSI/gaiaDB.git

cd gaiaDB

docker build -t gaia-db .

docker run -itd --rm -e POSTGRES_PASSWORD=SuperSecret -e POSTGRES_USER=postgres --network gaia -p 5432:5432 --name gaia-db gaia-db

Once deployed and (automatically) initialized, the containerized Postgres database includes: - GIS Catalog (backbone schema) - Constrained GIS vocabulary tables (vocabulary schema) - postgis tools (native to image, tiger schema)

PostGIS Database (Catalog and Staging)

Once you have a postgres database with PostGIS installed, the helper script initialize.sql in the inst directory will do the rest of the setup. This includes:

Creating a schema named “backbone”
Creating the tables described in the backbone specification and sequences for autoincrementing identifiers.
Inserting Gaia Catalog metadata (IMPORTANT: script assumes the files ‘./csv/data_source.csv’ and ‘./csv/variable_source.csv’ exist relative to ‘initialize.sql’, as they do in the inst directory) 4.Populating the geom_index and attr_index

Connecting the Database to R

Create a connection from your R environment to the PostGIS database. This can be done using the DatabaseConnector package:

library(gaiaCore)

connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = "localhost/gaiaDB",
  port = 5432,
  user="postgres",
  password = "mysecretpassword")

Adding a Data Source

Background

Data sources in gaiaDB are stored in the gaiaDB.backbone.data_source table in a Postgres database. Each row in this table is a single “data source”, though the name may be a bit misleading. All “data sources” in gaiaDB are actually bits of descriptive, technical, and operational metadata that refer to an externally hosted dataset. For example, the gaiaDB data source “daily_aqi_by_county_2020” is actually just a single row of data that contains information on how to extract, transform, and load (ETL) the EPA dataset located on their data catalog. The gaiaDB data source includes all the information necessary to acquire and use this dataset (including the dataset URL, documentation URL, and R code to transform it to gaia’s own data standard), without actually storing the dataset itself. Not only does this keep gaiaDB very lightweight (a data source takes only about 1 KB of storage), it also allows gaiaDB to handle a boundless amount of variability in source datasets.

The trade-off to this design is that each data source requires a certain degree of custom tailoring. Fortunately, many well-curated sources of geospatial data publish all of their own datasets with the same schema. So, while it may take a bit of effort to create the gaiaDB data source for “daily_aqi_by_county_2020”, the gaiaDB data source for “daily_aqi_by_county_2019”, and any other years, can be created in a matter of minutes by simply redirecting the data source to other years’ data.

Creating a Data Source

As of now, there are two main ways that gaiaDB data sources can be created:

Manually, by writing INSERT INTO SQL statements. Usually this is most easily done with a combination of spreadsheet software like MS Excel (for writing the original record), an R environment for creating the geom_spec (detailed below), and some sort of automation tool for recycling the original data source across multiple datasets.
Using the gaiaSourceCreator RShiny form, which can help guide the creation of data sources.

Creating the geom_spec

TODO

Sharing a Data Source

Motivation

If you have taken the time and effort to add a data source to your own local instance of gaiaDB, consider sharing it back to the rest of OHDSI GIS community. By sharing your data source you effectively reduce the duplication of effort, help to canonize a single “gaia version” of a web-hosted resource and increase the size of the public gaiaDB repository. All of this helps to nourish the OHDSI GIS system of tools and data sources.

GitHub Pull Request

At this time, the most straightforward way to share your local data source is via a GitHub Pull Request:

The source file for the gaiaDB.backbone.data_source table is on the OHDSI Github GIS/source. Make sure that your local gaiaDB instance reflects this source when you add data sources locally.
After you have added one or more data sources to your local gaiaDB instance, export the gaiaDB.backbone.data_source table as a CSV file.
Create a pull request using this template, attaching the CSV file that should replace the current source csv.
The gaiaDB software maintainers will accept and merge your contributions or reply to your issue with feedback.

Geocoding

TODO

Using a Local Dataset

TODO

OHDSI GIS WG

Getting Started with Gaia

Dockerized gaiaDB

PostGIS Database (Catalog and Staging)

Connecting the Database to R

Adding a Data Source

Background

Creating a Data Source

Creating the geom_spec

Sharing a Data Source

Motivation

GitHub Pull Request

Geocoding

Using a Local Dataset