Ch4 EHR

Ch4 Deep Dive – Electronic Health Records (EHR) #

Q1: What is the central focus of Chapter 4? #

Chapter 4 focuses on working with electronic health record (EHR) data using the MIMIC-III dataset, and explores medication harmonization using SQL, Neo4j (property graph), and TypeDB (typed hypergraph).


Q2: What makes working with EHR data complex? #

  • EHRs are highly structured but vary between implementations.
  • Data is often redundant, inconsistent, or missing.
  • Clinical context and domain knowledge are crucial for correct interpretation.

Q3: Why was medication harmonization chosen as the use case? #

Because medications are objective and widely used in EHRs, but the same drug can appear under multiple names or codes (e.g. NDCs). Harmonization is necessary to:

  • Normalize names or codes.
  • Filter out forms like “heparin flush” vs “therapeutic heparin”.

Q4: How is this implemented using SQL (SQLite)? #

  • CSVs are loaded into SQLite tables.
  • Free-text string search is done across drug name columns (e.g., drug_name_generic LIKE “%heparin%”).
  • NDC-based harmonization is done by:
    • Extracting distinct NDCs.
    • Filtering to include only valid/therapeutic ones.
    • Rewriting queries using explicit WHERE ndc IN (...) clauses.

Q5: What are the challenges of the SQL approach? #

  • Hard to maintain mappings (e.g., repeated NDC lists).
  • Poor separation of concerns (clinical logic leaks into queries).
  • Not reusable across projects or datasets.

Q6: How does Neo4j (property graph) improve the workflow? #

  • Drug instances are modeled as nodes, and a new “heparin (non-flush)” concept is created.
  • Relationships connect drug nodes to patients and prescriptions.
  • Reusability improves because mappings are stored inside the graph.
  • Queries become more semantic, e.g., follow a concept node rather than duplicating NDCs in every query.

Q7: How are concepts created in Neo4j? #

  • A node like Drug:Knowledge { drug: "Heparin (non-flush)", ... } is created.
  • Drug nodes are linked to it via [:for_drug {derived: true}] relationships.
  • This enables querying “all patients on this drug concept.”

Q8: How does TypeDB enhance this model further? #

  • TypeDB uses strong typing and roles to model relations:
    • patient plays prescription:patient
    • druginstance plays prescription:prescribed_drug
  • Separate druginstance entity and prescription relation.
  • Can add inference rules to dynamically associate drugs with higher-level concepts.

Q9: What are the two approaches to harmonization in TypeDB? #

  1. Persisted:
    • Insert actual hierarchy facts: (parent: heparin, child: drug) isa hierarchy.
  2. Rule-based:
    • Use rules like:
      rule heparin-rule:
      when { $d has ndc "xxxx"; $c has purl "..."; } then { (parent: $c, child: $d) isa hierarchy; }
      

Q10: What are the tradeoffs between approaches? #

Model Pros Cons
SQL Ubiquitous, accessible Poor semantics, duplication of logic
Neo4j Intuitive graph model, reusable concepts Fragile schemas, performance concerns
TypeDB Semantic precision, rule engine Newer ecosystem, complexity, fewer tools
RDF Graph Web standard, portable Very steep learning curve

Q11: What are the broader lessons? #

  • Separate clinical knowledge from code.
  • Choose modeling strategies based on:
    • Project duration
    • Tooling maturity
    • Frequency of schema changes
    • Collaboration needs
  • Use graphs or TypeDB to encode reusable logic and keep queries clean.

🧠 Curriculum Task-Based Summary (Chapter 4) #

🔹 1. Understanding EHR Data Models #

  • Compare OMOP, FHIR, i2b2, PCORnet, ADaM, SDTM.
  • Explore role of implementation guides and FHIR profiles.

🔹 2. Setup and Load EHR Data (MIMIC-III) #

  • Use SQLite to ingest .csv files (Example 4-1).
  • Use Neo4j or TypeDB containers (Docker).
  • Load and explore data with basic queries.

🔹 3. Medication Harmonization Use Case #

  • Focus on heparin, and identify pitfalls with NDC codes.
  • Extract and deduplicate NDCs from prescriptions.
  • Build queries that target therapeutic use only.

🔹 4. Query and Harmonize in Three Paradigms #

Task SQL Neo4j TypeDB
Load data pandas + sqlite3 pandas + NeoInterface pandas + TypeDB client
Query for “heparin” LIKE on drug names toLower(drug_name) CONTAINS drug has name contains
Harmonization Filter with ndc IN (...) Create concept node + edges Insert facts or define rules

🔹 5. Linkage and Reasoning #

  • Create custom drug concepts.
  • Track prescriptions per patient.
  • Link concepts using:
    • SQL joins
    • Neo4j (:Patient)-[:has_prescription]->(:Drug)
    • TypeDB roles + rules.

🔹 6. Evaluate Tradeoffs and Performance #

  • Review table of pros/cons (Table 4-2).
  • Balance:
    • Query simplicity vs data model reusability.
    • Rule-driven inference vs static mapping.
    • Ecosystem maturity.

✅ End of Chapter Outcome #

You should now be able to:

  • Choose the right data model (SQL, graph, hypergraph) for your RWD task.
  • Implement and harmonize medication concepts.
  • Balance engineering choices with clinical accuracy and long-term maintainability.
  • Begin thinking about integrating terminologies (UMLS, SNOMED CT) into your models.