OPTUM EHR Alzheimer’s Disease Enrichment ETL documentation

Background

Optum Alzheimer’s enriched clinical data is a quarterly data offering that includes patients with an AD or MCI diagnosis, or AD or MCI mention within provider notes, or are 55 years or older and have a symptom of interest mentioned within provider notes. The provider notes fed into the models are dated from January 1, 2007 to the most recent data available. The model output is then processed into a tabular format that can be easily joined to the structured data using structured query language (SQL) or other languages.

Optum Alzheimer’s enriched clinical data product is created by using NLP techniques catered specifically to the Alzheimer’s clinical domain to extract relevant concepts from unstructured provider notes. The data undergoes transformations allowing it to be usable for research, including value standardization, de-duplication, normalization, manual abstraction and some clinical validation. The overall goal of this data set is to provide data that is true to the notes and provides maximum information with high quality to meet users’ needs for research.

How to run this conversion

This conversion is run after the main corpus of the data is converted, then the results are inserted in CDM tables.

Table name: measurement

Reading from alz_assessment

The ALZ Cognitive Assessment table includes results from tests of interest that evaluate cognitive impairment, when present in the physician’s notes for a patient. Tests of interest include only: MMSE, MOCA, Mini-cog, SLUMS, 6CIT and AD8.

Mapping measurement_concept_id

system_name measurement_concept_id
6CIT 35609721 (6CIT - Six Item Cognitive Impairment Test)
AD8 36684956 (Eight-item Informant Interview to Differentiate Aging and Dementia screening score)
MMSE 4169175 (Mini-mental state examination)
MOCA 44808666 (Montreal cognitive assessment)
MOCA 2.1 44808666 (Montreal cognitive assessment)
MOCA 7.1 606673 (Montreal Cognitive Assessment version 7.1)
MOCA 7.2 606672 (Montreal Cognitive Assessment version 7.2)
MOCA 7.3 606670 (Montreal Cognitive Assessment version 7.3)
MOCA 8.1 606671 (Montreal Cognitive Assessment version 8.1)
MOCA 8.1 Blind 606671 (Montreal Cognitive Assessment version 8.1)
MOCA 8.2 44808666 (Montreal cognitive assessment)
MOCA 8.3 44808666 (Montreal cognitive assessment)
MOCA Blind 44808666 (Montreal cognitive assessment)
Minicog 37017073 (Mini-Cog)
SLUMS 605634 (Saint Louis University Mental Status)
Destination Field SourceField Logic Comment
measurement_id   Autogenerate  
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to
person_id ptid    
measurement_type_concept_id   32858 (NLP)  
measurement_date note_date    
measurement_concept_id system_name See mapping logic above  
measurement_source_concept_id   Set to 0  
measurement_source_value system_name    
provider_id encid Use the encid to lookup PROVIDER_ID in the VISIT_DETAIL table. If encid is blank then leave PROVIDER_ID blank.
observation_datetime      
value_as_number numeric_result    
value_source_value numeric_result    
value_as_concept_id      
unit_concept_id      
visit_detail_id      
unit_source_value      

Reading from alz_biomarker

The ALZ Biomarker table includes biomarkers associated with Alzheimer’s disease (AD), when present in the physician’s notes, for a specific patient. Biomarkers of interest include only amyloid beta, phosphorylated tau protein, and apolipoprotein. This table supplements the structured Lab table and can provide additional information regarding patient biomarker testing not found in the structured tables.

Mapping measurement_concept_id

biomarker biomarker_source measurement_concept_id Notes
amyloid beta 40 peptide plasma 42868556 (Amyloid beta 40 peptide [Mass/volume] in Plasma)  
amyloid beta 40 peptide cerebrospinal fluid 42868555 (Amyloid beta 40 peptide [Mass/volume] in CSF)  
amyloid beta 40 peptide   42868556 (Amyloid beta 40 peptide [Mass/volume] in Plasma)  
amyloid beta 42 peptide/amyloid beta 40 peptide blood   Mapping available in August 2025 release of vocabulary loinc code =106941-8
amyloid beta 42 peptide/amyloid beta 40 peptide cerebrospinal fluid 1617024 (Amyloid beta 42 peptide/Amyloid beta 40 peptide, Spinal fluid)  
amyloid beta 42 peptide/amyloid beta 40 peptide plasma   Mapping available in August 2025 release of vocabulary loinc code =106941-8
amyloid beta 42 peptide/amyloid beta 40 peptide     Mapping available in August 2025 release of vocabulary loinc code =106941-8
amyloid beta-peptides blood 1091801 (Beta-Amyloid 1-42 and 1-40 panel, Spinal fluid) Mapping to CSF in lieu of other concepts
amyloid beta-peptides cerebrospinal fluid 1091801 (Beta-Amyloid 1-42 and 1-40 panel, Spinal fluid) Mapping to CSF in lieu of other concepts
amyloid beta-peptides plasma 1091801 (Beta-Amyloid 1-42 and 1-40 panel, Spinal fluid) Mapping to CSF in lieu of other concepts
amyloid beta-peptides   1091801 (Beta-Amyloid 1-42 and 1-40 panel, Spinal fluid) Mapping to CSF in lieu of other concepts
amyloid beta-protein (1-42) blood 3043102 (Amyloid beta 42 peptide [Mass/volume] in Plasma)  
amyloid beta-protein (1-42) cerebrospinal fluid 3042810 (Amyloid beta 42 peptide [Mass/volume] in Cerebral spinal fluid)  
amyloid beta-protein (1-42) plasma 3043102 (Amyloid beta 42 peptide [Mass/volume] in Plasma)  
amyloid beta-protein (1-42)   3043102 (Amyloid beta 42 peptide [Mass/volume] in Plasma)  
amyloid tau index cerebrospinal fluid 3042151 (Tau protein/Amyloid beta 42 peptide [Ratio] in Cerebral spinal fluid)  
amyloid tau index   3042151 (Tau protein/Amyloid beta 42 peptide [Ratio] in Cerebral spinal fluid)  
apolipoprotein e blood 3044063 (Apolipoprotein E [Presence] in Serum or Plasma)  
apolipoprotein e cerebrospinal fluid 1617460 (Apolipoprotein E [Mass/volume] in Cerebral spinal fluid)  
apolipoprotein e plasma 3044063 (Apolipoprotein E [Presence] in Serum or Plasma)  
apolipoprotein e   3044063 (Apolipoprotein E [Presence] in Serum or Plasma)  
phosphorylated tau protein blood 3011901 (Tau protein [Mass/volume] in Serum) Mapping to general tau protein concept in lieu of phosphorylated concept
phosphorylated tau protein cerebrospinal fluid 3000242 (Tau protein [Mass/volume] in Cerebral spinal fluid) Mapping to general tau protein concept in lieu of phosphorylated concept
phosphorylated tau protein plasma 3011901 (Tau protein [Mass/volume] in Serum) Mapping to general tau protein concept in lieu of phosphorylated concept
phosphorylated tau protein   3011901 (Tau protein [Mass/volume] in Serum) Mapping to general tau protein concept in lieu of phosphorylated concept
tau protein, phosphorylated 181 cerebrospinal fluid 43055225 (Phosphorylated tau 181 [Mass/volume] in Cerebral spinal fluid by Immunoassay)  
tau protein, phosphorylated 181   1259491(Phosphorylated tau 181 [Mass/volume] in Plasma by Immunoassay)  
tau protein, phosphorylated 217   1092155 (Tau protein.phosphorylated 217 [Mass/volume] in Serum or Plasma by Immunoassay)  
tau protein/amyloid beta 42 peptide cerebrospinal fluid 3042151 (Tau protein/Amyloid beta 42 peptide [Ratio] in Cerebral spinal fluid)  
tau proteins blood 3011901 (Tau protein [Mass/volume] in Serum)  
tau proteins cerebrospinal fluid 3000242 (Tau protein [Mass/volume] in Cerebral spinal fluid)  
tau proteins   3011901 (Tau protein [Mass/volume] in Serum)  
total tau protein blood 3011901 (Tau protein [Mass/volume] in Serum)  
total tau protein cerebrospinal fluid 3000242 (Tau protein [Mass/volume] in Cerebral spinal fluid)  
total tau protein plasma 3011901 (Tau protein [Mass/volume] in Serum)  
total tau protein   3011901 (Tau protein [Mass/volume] in Serum)  
Destination Field SourceField Logic Comment  
measurement_id   Autogenerate    
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to  
person_id ptid      
measurement_type_concept_id   32858 (NLP)    
measurement_date note_date      
measurement_concept_id biomarker
biomarker_source
See mapping logic above    
measurement_source_concept_id   Set to 0    
measurement_source_value concat(biomarker,’|’,biomarker_source)      
provider_id encid Use the encid to lookup PROVIDER_ID in the VISIT_DETAIL table. If encid is blank then leave PROVIDER_ID blank.  
observation_datetime        
value_as_number numeric_result      
value_source_value numeric_result
narrative_result
variant
concat(coalesce(numeric_result,narrative_result),’|’, variant)    
value_as_concept_id narrative_result Map value_as_concept_id using the CONCEPT table where lower(narrative_result)=lower(concept_code) and vocabulary_id=’SNOMED’ and concept_class_id=’Qualifier Value’ and standard_concept=’S’    
unit_concept_id unit Map the following:
%= 8554 (Percent)
pg/ml= 8845 (picogram per milliliter)
   
visit_detail_id        
unit_source_value unit      

Querying the Biomarker Data

Because of the lack of specificity in the native data and granularity in the mapping of the vocabulary, it is recommended to search the measurement_source_value when looking to differentiate between tau, total tau, ptau, ptau 181 and ptau217.

Example query shown below:

    select measurement_concept_id,measurement_source_value,c.concept_name,c.concept_code,c.vocabulary_id,count(*)
    from measurement m 
    left join concept c on m.measurement_concept_id =c.concept_id 
    where measurement_type_concept_id =32858
    and measurement_source_value ~ 'tau'
    group by 1,2,3,4,5
    order by 2 asc;

Table name: procedure_occurrence

Reading from alz_imaging

The ALZ Radiology Report Findings table features provider-documented imaging procedure findings of Alzheimer’s disease (AD) or cognitive impairment progression and treatment efficacy, as noted in radiology reports. These findings are based on brain PET scans, MRI, MRS, CT and SPECT images. The table includes neuroimaging biomarker findings that help confirm AD diagnoses and monitor disease progression and treatment effectiveness.

Mapping procedure_concept_id

procedure contrast procedure_concept_id
computed tomography with 1073730 (CT with contrast)
computed tomography with and without 4300757 (Computed Tomography)
computed tomography without 4163903 (CT without contrast)
computed tomography   4300757 (Computed Tomography)
magnetic resonance imaging with 4198856 (MRI with contrast)
magnetic resonance imaging with and without 4013636 (magnetic resonance imaging)
magnetic resonance imaging without 4231864 (MRI without contrast)
magnetic resonance imaging   4013636 (magnetic resonance imaging)
magnetic resonance spectroscopy with 4082847 (magnetic resonance spectroscopy)
magnetic resonance spectroscopy with and without 4082847 (magnetic resonance spectroscopy)
magnetic resonance spectroscopy without 4082847 (magnetic resonance spectroscopy)
magnetic resonance spectroscopy   4082847 (magnetic resonance spectroscopy)
positron emission tomography with 4305790 (positron emission tomography)
positron emission tomography without 4305790 (positron emission tomography)
positron emission tomography   4305790 (positron emission tomography)
single-photon emission computed tomography with 4019823 (single photon emission computed tomography)
single-photon emission computed tomography   4019823 (single photon emission computed tomography)
Destination Field SourceField Logic Comment
procedure_occurrence_id   Autogenerate  
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to
person_id ptid    
procedure_type_concept_id   32858 (NLP)  
procedure_concept_id procedure
contrast
See Mapping Logic Above  
procedure_source_concept_id   Set to 0  
procedure_source_value procedure
contrast
concat(procedure,’|’,contrast)  
provider_id encid Use the encid to lookup PROVIDER_ID in the VISIT_DETAIL table. If encid is blank then leave PROVIDER_ID blank.
procedure_date procedure_date
encounter_date
coalesce(procedure_date,note_date)  
procedure_datetime procedure_date
encounter_date
coalesce(procedure_date,note_date)  
modifier_concept_id      
quantity      
visit_detail_id      
modifier_source_value      

Table name: note, note_nlp

Reading from alz_problem

The ALZ Symptoms and Problems table records patient cognitive impairment, health incidents, complaints, ailments and progression when present in the physician’s notes. Examples may include cognitive, behavioral, emotional, psychiatric and physical symptoms and problems. Additionally, this table includes records of provider documented Alzheimer’s or MCI diagnoses recorded in the provider note. This information offers an overview of a patient’s health over time

NOTE

Destination Field Source Field Logic Comment
note_id autogenerate    
person_id ptid    
nlp_date text_date
note_date
coalesce(text_date,encounter_date)  
Nlp_datetime text_date
note_date
coalesce(text_date,encounter_date) Set time to midnight  
note_type_concept_id 32858 (NLP)    
note_class_concept_id      
note_title section    
note_text problem concat(“problem:”,problem)  
Encoding_concept_id 0    
Language_concept_id 40639387 US English  
Provider_id encid Use the encid to lookup the PROVIDER_ID from the associated VISIT_DETAIL record If encid is blank then leave PROVIDER_ID blank
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to
note_source_value problem
section
concat(problem,’|’,section’)  

NOTE_NLP

Destination Field Source Field Logic Comment
Note_nlp_id autogenerate    
Note_id autogenerate    
person_id ptid    
nlp_date text_date
note_date
coalesce(text_date,encounter_date)  
Nlp_datetime text_date
note_date
coalesce(text_date,encounter_date) Set time to midnight  
section_concept_id 0    
lexical_variant problem    
term_exists qualifier    
term_temporal temporality    
term_modifiers severity
chronicity
stage
change
Concatenate a string with the applicable modifiers
“Severity=severity| Chronicity=chronicity| Stage=stage| Change=change
 

Reading from alz_imaging

NOTE

Destination Field Source Field Logic Comment
note_id autogenerate    
person_id ptid    
nlp_date procedure_date
note_date
coalesce(text_date,encounter_date)  
Nlp_datetime procedure_date
note_date
coalesce(text_date,encounter_date) Set time to midnight  
note_type_concept_id 32858 (NLP)    
note_class_concept_id      
note_title      
note_text findings concat(“imaging:”,findings)  
Encoding_concept_id 0    
Language_concept_id 40639387 US English  
Provider_id encid Use the encid to lookup the PROVIDER_ID from the associated VISIT_DETAIL record If encid is blank then leave PROVIDER_ID blank
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to
note_source_value findings
reasons
concat(findings,’|’,reasons’)  
note_event_id   Link to associated procedure_occurrence id from record that was created  

Table name: drug_exposure

Reading from alz_medication

The ALZ Medications table records Alzheimer’s disease (AD) and cognitive impairment drugs that have been documented in the unstructured physician notes. Medications of interest include donepezil (brand and generic), mematine (brand and generic), Rexulti®, Exelon® and others. This table supplements the structured medication data, such as written prescriptions, administered medications and procedures and can provide additional information regarding the actions and responses associated with medications not found in the structured tables.

drug drug_concept_id Notes
aducanumab 1537087 (aducanumab)  
aduhelm 1537087 (aducanumab) Aduhelm is the brand name of aducanumab
aricept 715997 (donepezil) Aricept is the brand name of donepezil
brexpiprazole 46275300 (brexpiprazole)  
donanemab 1734796 (donanemab)  
donepezil 715997 (donepezil)  
donepezil / memantine 715997 (donepezil)  
donepezil / memantine 701322 (memantine)  
exelon 733523 (risvastigmine) Exelon is the brand name of risvastigmine
galantamine 757627 (galantamine)  
gantenerumab 36851650 (gantenerumab)  
kisunla 1734796 (donanemab) Kisunla is the brand name of donanemab
lecanemab 1301377 (lecanemab)  
leqembi 1301377 (lecanemab) Leqembia is the brand name of lecanemab
memantine 701322 (memantine)  
namenda 701322 (memantine) Namenda is the brand name of memantine
namzaric 715997 (donepezil) Namzaric is the brand name of donepezil / memantine
namzaric 701322 (memantine) Namzaric is the brand name of donepezil / memantine
razadyne 757627 (galantamine) Razadyne is the brand of galantamine
rexulti 46275300 (brexpiprazole) Rexulti is the brand of brexpiprazole
risvastigmine 733523 (risvastigmine)  
Destination Field Source field Logic Comment field  
drug_exposure_id id      
person_id ptid      
drug_concept_id drug See mapping above    
drug_exposure_start_date encounter_date      
drug_exposure_start_datetime encounter_date      
drug_exposure_end_date encounter_date      
drug_exposure_end_datetime encounter_date      
verbatim_end_date        
drug_type_concept_id 32858 (NLP)      
refills        
quantity        
days_supply        
sig        
route_concept_id        
Provider_id encid Use the encid to lookup the PROVIDER_ID from the associated VISIT_DETAIL record If encid is blank then leave PROVIDER_ID blank  
visit_occurrence_id encid Lookup the VISIT_OCCURRENCE_ID based on the encid If encid is blank then use diag_date to determine which VISIT_OCCURRENCE_ID the diagnosis should be associated to  
visit_detail_id        
drug_source_value drug      
drug_source_concept_id 0      
route_source_value        

Please contact Clair Blacketer (https://github.com/clairblacketer) if you have any questions