Quick Start Guide
A
Quick
Start Guide is provided if you are just interested in running the
Ponos tool to create a test instance of the CDM in Databricks and/or
connect an existing instance of a CDM in Databricks to OHDSI. This quick
start guide is for a standalone Tomcat instance of OHDSI (i.e. not
Broadsea). For guidance on setting up a Databricks CDM instance using
Broadsea, consult the
Broadsea
Implementation Guide.
Introduction
This implementation guide demonstrates an end to end solution for
connecting an existing Common Data Model (CDM) to OHDSI. This guide is
base on an automated process implemented in the Ponos project. The Ponos
tool can be used to perform all of the steps required to OHDSI enable a
CDM instance in Databricks. The Ponos tool also includes a solution to
create an instance of the Broadsea DEMO_CDM instance in databricks. This
guide is intended to provide the following.
-
An automated build
The Ponos tool is provided to automate
the process of getting an OHDSI instane set up in Databricks. This tool
can be used to create an instance of the Broadsea DEMO_CDM in
Databricks. This tool can be used to connect any instance of the CDM in
Databricks to OHDSI including development, test, and production
instances.
-
A reference implementation
The information provided here can
be used as a reference implementation. There are other ways the work
done by the Ponos tool can be implemented. The Ponos tool represents a
know working example of how to create an OHDSI instance from a CDM in
Databricks.
-
Testing/Validation
The Ponos tool creates a working OHDSI
instance in Databricks and thereby provides a successful test and
validation of the underlying tools used to do so.
-
Insight into the process
The code used by Ponos is available
in github. The code can be run from an IDE such as Eclipse and can be
revied and stepped through to gain insight in to the process and tools
used here to create an instance of OHDSI using Databricks.
Prerequisites
This guide is for a solution that is not based on Broadsea. This guide
assumes the following:
-
An instance of a CDM in Databricks (Optional)
This is
actually optional, if you don’t have an existing CDM the Ponos tool can
be used to generate a demo CDM)
-
PostgreSql
A local instance of PostgreSql is used for the
webapi schema
-
Basic developer software
Java 11 is required. Other software
such as Maven and an IDE are useful if your interested in the details of
how everything works.
-
A Full Install of Atlas and Underlying Software
A full
install of Atlas is required as a prerequisite for this process. An
automated process for installing Atlas and all of its dependencies is
described
here.
Overview
Code for the Ponos project is open source (Apache 2 license) and is
available in github at
https://github.com/NACHC-CAD/ponos.
This project is just the user interface for the actual functionality.
The functionality for Ponos is implemented in the fhir-to-omop tool
suite which is also open source (Apache 2 license) and is available at
https://github.com/NACHC-CAD/fhir-to-omop.
Enabling an instance of a CDM consists of the following high level
steps. Each of these steps are detailed in this guide. This entire
process has been automated by the Ponos tool using two steps. One step
to create the DEMO_DB instance in Databricks (db-demo) and one step to
do all of the work shown below required to get a CDM instance OHDSI
enabled.
-
Create a test instance of the CDM in Databricks
To create a
test instance of the CDM in Databricks,
Download
and Install Ponos and the run the following:
run-ponos.bat db-demo
The code that creates the demo_cdm in Databricks can be found in the
fhir-to-omop
BuildDemoCdmInDatabricks
class. The demo_cdm from the Broadsea distribution is created in
Databricks using the Ponos tool. Data are sourced from .csv files
included in the Ponos project that were created as a extract from a
PostgreSql instance of the demo_cdm. This install includes the
following:
-
Upload of .csv files for the CDM to the Databricks FileStore
-
Creation of the CDM database in Databricks using the DDL files from the
Common Data Model
(CDM) (version 5.3 is used).
-
Population of the CDM (including vocabulary tables) from the uploaded
.csv files
-
Connect an Existing CDM Instance to OHDSI
To connect an
existing CDM instance to OHDSI run the following:
run-ponos.bat db-init
The code that connects an existing CDM to OHDSI can be found in the
OhdsiEnableExistingDatabricksCdm
class. This code does the following:
-
Create the OHDSI database instance in PostgreSql (this is the home for
the webapi schema). The OHDSI PostgreSql instance is dropped if it
already exists.
-
Create the Atlas database users for the PostgreSql database and webapi
schema
-
Create the webapi schema
-
Create the webapi tables and other database objects
-
Create the Achilles results database in Databricks
-
Create the tables in the Achilles database in Databricks
-
Create the achilles_analysis table from the AchillesAnalysisDetails.csv
file
-
Run Achillies
-
Create the appropriate source and source_daimon records in webapi.
Note: The code executed here adds the “UseNative=1;” parameter to the
JDBC URL inserted into the source table if it is not already there (as
described in the
Notes on the
Databricks URL, SSL, and UseNative ). This allows the user to use
the JDBC URL provided by Databricks “as-is”.
Do It Yourself!
To get your instance of OHDSI on Databricks up and running: