Healthcare Data Sources #
Phenotype KnowledgeBase (PheKB) #
Description:
A collaborative portal for sharing and validating electronic phenotype definitions used in observational health research.
Tags: phenotyping
, EHR
, cohort definitions
Use Cases:
- Standardized phenotype definitions for conditions like diabetes, asthma, etc.
- Sharing phenotype algorithms across institutions
MIMIC-IV (Medical Information Mart for Intensive Care) #
Description:
A large, publicly available critical care database containing de-identified health data from ICU patients at the Beth Israel Deaconess Medical Center.
Tags: ICU
, de-identified data
, clinical research
Use Cases:
- Predictive modeling in critical care
- Benchmarking clinical algorithms
- Training deep learning models
Access Requirements:
Requires credentialed training and data use agreement via PhysioNet
OHDSI / OMOP Common Data Model #
Description:
An open community initiative and standard model for organizing observational health data across institutions and studies.
Tags: standardization
, EHR
, interoperability
, CDM
Use Cases:
- Converting disparate data sources into a consistent format
- Enabling federated analysis across healthcare systems
- Supporting tools like ATLAS for cohort building
National COVID Cohort Collaborative (N3C) #
Description:
A centralized, secure platform for analyzing harmonized COVID-19 clinical data from dozens of healthcare providers across the US.
Tags: COVID-19
, federated research
, clinical data
Use Cases:
- Studying disease trajectories and treatment effects
- Multisite analytics using harmonized EHR data
- Evaluating outcomes for long COVID
Access Requirements:
Application and institutional affiliation required
BioPortal #
Description:
A comprehensive repository of biomedical ontologies from the National Center for Biomedical Ontology.
Tags: ontologies
, terminology
, semantic web
, linked data
Use Cases:
- Accessing ontologies like SNOMED CT, ICD, LOINC, RxNorm
- Mapping data to standard vocabularies
- Enabling semantic interoperability
Unified Medical Language System (UMLS) #
Description:
Integrates over 200 biomedical vocabularies to support natural language processing, terminology mapping, and EHR data harmonization.
Tags: NLP
, standard vocabularies
, concept mapping
Use Cases:
- Linking clinical terms to standard codes
- Enhancing search and retrieval in clinical systems
- Supporting NLP tools like MetaMap and cTAKES
Access Requirements:
Free license from NLM, requires annual agreement
Aphrodite #
Description:
An R package developed by OHDSI that supports semi-supervised phenotype algorithm development using feature engineering and machine learning methods on OMOP Common Data Model (CDM) datasets.
Tags: phenotyping
, machine learning
, semi-supervised
, OMOP
, OHDSI
Use Cases:
- Rapid development of phenotype classifiers using imperfectly labeled data.
- Applying machine learning models to predict phenotypes based on structured EHR data.
- Feature extraction from OMOP CDM to support supervised or semi-supervised learning tasks.