Terminology
Definitions and context for common terminology within the GIS
workgroup
Backbone
- Definition The name of the database schema that
contains the core components for sourcing and staging source data
- In Context
- This is where data_source, variable_source, attr_index, geom_index,
attr_template and geom_template live.
- The DDL statement for this schema as well as DML statement or CSV
file for the data_source and variable_source are lightweight ways for
anyone with Postgres to stand-up and run the database.
data_source
- Definition A table that stores the basic
information necessary to source and stage web-hosted geographic
datasets. The basis to a “catalog” of available source datasets
- In Context
- Lives in the backbone schema
- The data in this table requires some manual attention to create and
is valuable
- Contains the download URL for all datasets and declares whether they
contain attributes, a geometry, or both. For geometry datasets, also
contains the geom_spec
- This would likely be where more advanced source metadata could be
stored
variable_source
- Definition A table that stores supplemental
information necessary to extract individual variables from a source
attribute table. The basis to a “catalog” of available variables
- In Context
- Lives in the backbone schema
- The data in this table requires some manual attention to create and
is valuable
- Used in conjunction with the data_source table to download and stage
inidividual variables from a source attribute table.
- This table contains the attr_spec which contains the directions for
transforming the staged attribute table into a specific variable
geom_index
- Definition An index of source datasets containing a
geometry
- In Context
- Lives in the backbone schema
- The data in this table is generated programmatically from the
data_source table using a function called createIndices() in the R
package. It is fairly simple and cheap to regenerate.
geom_template
- Definition An empty table that provides the
structure (column name, type, and constraints) to geom_X tables.
- In Context
- This table should serve as the Single Source of Truth for geom_X
table structure. Functionality in the R package should defer to
this.
geom_X
- Definition Refers to a transformed, staged version
of a source geometry dataset. Essentially, a final data product that is
in a standardized form and is analysis ready.
- In Context
- An instance of this table will have the name “geom_” and
live in schema . Table name and database schema are
specified in geom_index.
- The data in an instance of this table is generated programmatically
using the loadGeometry() function in the R package. It is fairly simple
to create instances, but may be time intensive depending on the size of
the source dataset.
- Typically, these instances will be created in the background when a
variable that depends on the geometry is created.
geom_spec
- Definition A JSON-formatted string that contains a
list of R functions to be performed in R in order to transform a source
geometry dataset into the standardized geom_X
- In Context
- The geom_spec lives as an entry in a data source record.
- Only datasets with geometries have geom_spec entries in the
data_source table
attr_index
- Definition An index of source datasets containing a
n attribute
- In Context
- Lives in the backbone schema
- The data in this table is generated programmatically from the
data_source table using a function called createIndices() in the R
package. It is fairly simple and cheap to regenerate
attr_template
- Definition An empty table that provides the
structure (column name, type, and constraints) to attr_X tables
- In Context
- This table should serve as the Single Source of Truth for attr_X
table structure. Functionality in the R package should defer to
this.
attr_X
- Definition Refers to a transformed, staged version
of a source attribute dataset. Essentially, a final data product that is
in a standardized form and is analysis ready.
- In Context
- Attr_X tables are Entity-Attribute-Value (EAV) tables with
additional metadata columns such as start and end date for its
constituent variables
- An instance of this table will have the name “attr_” and
live in schema . Table name and database schema are
specified in attr_index.
- The data in an instance of this table is generated programmatically
using the loadVariable() function in the R package. It is fairly simple
to create instances, but may be time intensive depending on the size of
the source dataset.
- An attr_X table is created the first time a variable from an
attribute data source is loaded. Every subsequent variable loaded from
that data source is appended to that attr_X table
- The granularity of this table is a data point that is associated to
a single geometry at a time (I.e. a county for a given date, a
coordinate for a given year). Therefore, each entry has a value,
attribute, start and end date, and is associated to a geometry in a
geom_X instance via a geom_record_id
attr_spec
- Definition A JSON-formatted string that contains a
list of R functions to be performed in R in order to transform a source
attribute dataset into a single variable that will be part of the
standardized attr_X
- In Context
- The attr_spec lives as an entry in a variable source record.
- Unlike geom_spec, of which there is a one-to-one link with a source
dataset, there is likely more than one attr_spec with a source attribute
dataset
Geometry
- Definition The spatial component of data. A single
geometry value will typically be in the form of POINT, LINESTRING,
POLYGON, but can also be MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, or a
set of two or more types called GEOMETRYCOLLECTION
- In Context
- data_source
- Does a source of data contain a geometry? If yes, geom_type and
geom_spec will not be null
- Geom_type will list the data type (POINT, LINESTRING, etc)
- Geom_spec will contain a list of transformations to perform on the
source dataset *Geom_dependency_uuid is utilized by attribute sources to
link them to the data_source ID of the geometry on which they
depend
- geom_index
- An index for the geometries in data_source.
- Provides no “new” information about the geometry, but does contain
the reformatted database_schema and table_name where the geom_X table
will be populated
Attribute
- Definition A set or collection of variables that
are or can be associated to a geometry. An attribute almost always
refers to the entire source dataset, excluding the geometry and any
metadata columns (I.e. all variables).
- In Context
- data_source
- Does a source of data contain an attribute? If yes, the
has_attributes boolean column will have a value 1
- Geom_dependency_uuid must point to a geometry on which the attribute
depends. If the dependency is within the same source dataset,
geom_dependency_uuid and data_source_uuid will have the same value
- variable_source
- The individual variables within an attribute are described
here.
- A variable’s Data_source_uuid in the variable_source links to it’s
parent attribute in the data_source
- attr_index
- An index or the attributes in variable_source
- Links to a geometry dependency in the geom_index table
- contain the reformatted database_schema and table_name where the
attr_X table will be populated
Variable
- Definition A single descriptive element that can be
associated to a geometry. This can be thought of as a single column from
a source attribute.
- In Context
- variable_source
- Each variable is given a unique identifier, name, and description
here
- Linked back to their parent attribute via the data_source_uuid
column
- The procedure for extracting the variable from the source dataset is
stored in the attr_spec