parent_selector
Stated parent selection for SNOMED CT new concept classification.
Given the raw pipeline results (from hierarchy_results_raw.json), this
module derives a per-concept stated "Is a" parent list by voting across the
SNOMED parents of the retrieved reference terms.
No embeddings or LLM calls — operates purely on OMOP concept_ids
already stored in the cached JSON, plus SQL against concept_relationship.
Algorithm
For each source concept:
- Collect the reference examples stored in
raw_results[i]["reference_examples"]. - Batch-query
concept_relationshipfor the "Is a" parents of all unique reference termconcept_ids in a single SQL call. - For each reference term with
similarity >= min_similarity, add its parent SCTIDs to a weighted vote (weight = similarity score). - Filter out root/overly-generic parents (see
_GENERIC_CONCEPT_IDS). - Optionally apply an attribute subsumption filter: skip a reference term
whose attribute
concept_id_2set is a strict subset of the source's predicted attribute set (that reference term is less specific → its parents would be too deep in the hierarchy). - Return the top-k SCTIDs by weighted vote. Fall back to
Clinical finding(404684003) if no candidates pass all filters.
Usage::
import json, psycopg
from ariadne.hierarchy.parent_selector import build_stated_parents_map
with open("data/notebook_results/hierarchy_results_raw.json") as f:
results = json.load(f)
with psycopg.connect(conn_str) as conn:
stated_parents = build_stated_parents_map(results, conn, schema)
# {omop_concept_id: ["sctid1", "sctid2", ...]}
build_stated_parents_map(raw_results, conn, schema, *, top_k=2, min_similarity=0.7, use_attr_filter=True)
Build a per-concept stated parent map from cached pipeline results.
Reads source_concept_id and reference_examples from each entry in
raw_results, queries concept_relationship for reference parents in a
single batch call, then votes to select up to top_k parent SCTIDs per
source concept.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_results
|
list[dict]
|
List of result dicts from |
required |
conn
|
Connection
|
Open psycopg connection to the vocabulary database. |
required |
schema
|
str
|
OMOP vocabulary schema name. |
required |
top_k
|
int
|
Maximum stated parents per concept (default 2). |
2
|
min_similarity
|
float
|
Minimum reference similarity to be counted (default 0.7). |
0.7
|
use_attr_filter
|
bool
|
Apply attribute-subsumption filter (default True). |
True
|
Returns:
| Type | Description |
|---|---|
dict[int, list[str]]
|
|
dict[int, list[str]]
|
|
Source code in src/ariadne/hierarchy/parent_selector.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
get_reference_parents(ref_concept_ids, conn, schema)
Return "Is a" parents for a batch of reference SNOMED concept IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref_concept_ids
|
list[int]
|
OMOP |
required |
conn
|
Connection
|
Open psycopg connection to the vocabulary database. |
required |
schema
|
str
|
OMOP vocabulary schema name. |
required |
Returns:
| Type | Description |
|---|---|
dict[int, list[tuple[str, str]]]
|
|
dict[int, list[tuple[str, str]]]
|
Generic parents (see |
Source code in src/ariadne/hierarchy/parent_selector.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
score_parent_candidates(reference_examples, reference_parents, source_attr_concept_ids, *, min_similarity=0.7, use_attr_filter=True, top_k=2)
Vote across reference term parents to find the best candidates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_examples
|
list[dict]
|
The |
required |
reference_parents
|
dict[int, list[tuple[str, str]]]
|
Output of |
required |
source_attr_concept_ids
|
set[int]
|
Set of |
required |
min_similarity
|
float
|
Skip reference terms with similarity below this threshold. |
0.7
|
use_attr_filter
|
bool
|
When True, skip reference terms whose attribute set is a strict subset of the source's predicted attributes — those terms are less specific and their parents would be too deep. |
True
|
top_k
|
int
|
Maximum number of parent candidates to return. |
2
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, str, float]]
|
List of |
list[tuple[str, str, float]]
|
sorted descending by score, capped at |
Source code in src/ariadne/hierarchy/parent_selector.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |