Adding a Data Source

Background

Data sources in gaiaDB are stored in the gaiaDB.backbone.data_source table in a Postgres database. Each row in this table is a single “data source”, though the name may be a bit misleading. All “data sources” in gaiaDB are actually bits of descriptive, technical, and operational metadata that refer to an externally hosted dataset. For example, the gaiaDB data source “daily_aqi_by_county_2020” is actually just a single row of data that contains information on how to extract, transform, and load (ETL) the EPA dataset located on their data catalog. The gaiaDB data source includes all the information necessary to acquire and use this dataset (including the dataset URL, documentation URL, and R code to transform it to gaia’s own data standard), without actually storing the dataset itself. Not only does this keep gaiaDB very lightweight (a data source takes only about 1 KB of storage), it also allows gaiaDB to handle a boundless amount of variability in source datasets.

The trade-off to this design is that each data source requires a certain degree of custom tailoring. Fortunately, many well-curated sources of geospatial data publish all of their own datasets with the same schema. So, while it may take a bit of effort to create the gaiaDB data source for “daily_aqi_by_county_2020”, the gaiaDB data source for “daily_aqi_by_county_2019”, and any other years, can be created in a matter of minutes by simply redirecting the data source to other years’ data.

Creating a Data Source

As of now, there are two main ways that gaiaDB data sources can be created:

  1. Manually, by writing INSERT INTO SQL statements. Usually this is most easily done with a combination of spreadsheet software like MS Excel (for writing the original record), an R environment for creating the geom_spec (detailed below), and some sort of automation tool for recycling the original data source across multiple datasets.
  2. Using the gaiaSourceCreator RShiny form, which can help guide the creation of data sources.

Creating the geom_spec

TODO