Healthcare Data Sources

Healthcare Data Sources #

Phenotype KnowledgeBase (PheKB) #

Description:
A collaborative portal for sharing and validating electronic phenotype definitions used in observational health research.

Tags: phenotyping, EHR, cohort definitions

Use Cases:

  • Standardized phenotype definitions for conditions like diabetes, asthma, etc.
  • Sharing phenotype algorithms across institutions

MIMIC-IV (Medical Information Mart for Intensive Care) #

Description:
A large, publicly available critical care database containing de-identified health data from ICU patients at the Beth Israel Deaconess Medical Center.

Tags: ICU, de-identified data, clinical research

Use Cases:

  • Predictive modeling in critical care
  • Benchmarking clinical algorithms
  • Training deep learning models

Access Requirements:
Requires credentialed training and data use agreement via PhysioNet


OHDSI / OMOP Common Data Model #

Description:
An open community initiative and standard model for organizing observational health data across institutions and studies.

Tags: standardization, EHR, interoperability, CDM

Use Cases:

  • Converting disparate data sources into a consistent format
  • Enabling federated analysis across healthcare systems
  • Supporting tools like ATLAS for cohort building

National COVID Cohort Collaborative (N3C) #

Description:
A centralized, secure platform for analyzing harmonized COVID-19 clinical data from dozens of healthcare providers across the US.

Tags: COVID-19, federated research, clinical data

Use Cases:

  • Studying disease trajectories and treatment effects
  • Multisite analytics using harmonized EHR data
  • Evaluating outcomes for long COVID

Access Requirements:
Application and institutional affiliation required


BioPortal #

Description:
A comprehensive repository of biomedical ontologies from the National Center for Biomedical Ontology.

Tags: ontologies, terminology, semantic web, linked data

Use Cases:

  • Accessing ontologies like SNOMED CT, ICD, LOINC, RxNorm
  • Mapping data to standard vocabularies
  • Enabling semantic interoperability

Unified Medical Language System (UMLS) #

Description:
Integrates over 200 biomedical vocabularies to support natural language processing, terminology mapping, and EHR data harmonization.

Tags: NLP, standard vocabularies, concept mapping

Use Cases:

  • Linking clinical terms to standard codes
  • Enhancing search and retrieval in clinical systems
  • Supporting NLP tools like MetaMap and cTAKES

Access Requirements:
Free license from NLM, requires annual agreement


Aphrodite #

Description:
An R package developed by OHDSI that supports semi-supervised phenotype algorithm development using feature engineering and machine learning methods on OMOP Common Data Model (CDM) datasets.

Tags: phenotyping, machine learning, semi-supervised, OMOP, OHDSI

Use Cases:

  • Rapid development of phenotype classifiers using imperfectly labeled data.
  • Applying machine learning models to predict phenotypes based on structured EHR data.
  • Feature extraction from OMOP CDM to support supervised or semi-supervised learning tasks.