agent medical skill risk: medium
PyHealth Clinical Pipeline Builder
Instructs the model to act as a PyHealth expert that builds clinical deep-learning pipelines following the Dataset → Task → Model → Trainer → Metrics pattern for EHR, signal, and i…
- Policy sensitive
- Human review
SKILL 8 files · 2 folders
SKILL.md
---
name: pyhealth
description: "Build clinical/healthcare deep-learning pipelines with PyHealth — loading EHR/signal/imaging datasets (MIMIC-III/IV, eICU, OMOP, SleepEDF, ChestXray14, EHRShot), defining tasks (mortality, readmission, length-of-stay, drug recommendation, sleep staging, ICD coding, EEG events), instantiating models"
---
# PyHealth
PyHealth (https://pyhealth.dev/) is a Python toolkit for clinical deep learning. It provides a unified, modular pipeline across electronic health records (EHR), physiological signals, and medical imaging.
The library is built around a **5-stage pipeline** — `Dataset → Task → Model → Trainer → Metrics` — where each stage is replaceable and the interfaces between stages are stable. Code that follows this pipeline shape composes well; code that bypasses it usually fights the library.
## When to use this skill
Use this skill whenever the user is doing clinical/healthcare ML and any of the following are true:
- They mention PyHealth, MIMIC-III/IV, eICU, OMOP-CDM, EHRShot, SleepEDF, SHHS, ISRUC, COVID19-CXR, ChestX-ray14, TUEV/TUAB.
- They want to predict mortality, readmission, length of stay, drug recommendations, sleep stages, ICD codes, EEG events, or de-identification.
- They need to look up or cross-map medical codes (ICD-9-CM, ICD-10-CM, ATC, NDC, RxNorm, CCS).
- They have EHR-shaped data and want to train a clinical model without writing the plumbing themselves.
PyHealth is the right tool when the workflow fits its 5 stages. If the user just wants generic PyTorch on tabular data, this skill is not necessary.
## Installation (uv)
PyHealth 2.0 requires Python ≥ 3.12, < 3.14. Use `uv` for environment management — it's faster and reproducible.
```bash
# Create a project with the right Python
uv init my-pyhealth-project
cd my-pyhealth-project
uv python pin 3.12
# Add PyHealth (this also pulls in PyTorch and friends)
uv add pyhealth
# Run scripts inside the env
uv run python train.py
```
For a one-off script without a project, use `uv run --with pyhealth python script.py`. For the legacy 1.x line (Python 3.9+), `uv add pyhealth==1.16`. Detailed install notes, MIMIC access, and GPU/CPU device tips are in `references/installation.md`.
## The 5-stage pipeline
A complete pipeline is typically <20 lines. This is the canonical shape — start here and modify pieces:
```python
from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
from pyhealth.metrics.binary import binary_metrics_fn
# 1. Dataset — raw patient registry
base = MIMIC3Dataset(
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
)
# 2. Task — converts patients into supervised samples
samples = base.set_task(MortalityPredictionMIMIC3())
# 3. Split + DataLoaders (split by patient to avoid leakage)
train_ds, val_ds, test_ds = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)
# 4. Model — must be passed the SampleDataset, not the BaseDataset
model = Transformer(dataset=samples)
# 5. Train + evaluate
trainer = Trainer(model=model)
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc",
)
y_true, y_prob, _ = trainer.inference(test_loader)
print(binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"]))
```
A copy-pasteable starter is in `assets/starter_pipeline.py`.
## Critical things to get right
These are the mistakes that PyHealth code most commonly trips on. Internalize them before writing pipelines:
1. **Models take a `SampleDataset`, not a `BaseDataset`.** `MIMIC3Dataset(...)` returns a `BaseDataset` (a queryable patient registry). Only after `.set_task(task)` do you get a `SampleDataset`, which is what models, splitters, and DataLoaders expect. If you pass `base` to a model, it will fail or behave wrong.
2. **Always split by patient (or visit), not by sample.** Random sample-level splits leak information across train/test because the same patient can appear in both. Use `split_by_patient` for patient-level prediction, `split_by_visit` only when visits are independent.
3. **Match the task to the dataset.** Tasks are dataset-specific: `MortalityPredictionMIMIC3` won't work on MIMIC-IV — use `MortalityPredictionMIMIC4` or `InHospitalMortalityMIMIC4`. The full mapping is in `references/tasks.md`.
4. **Pick `monitor` to match the task type.** For binary classification use `"pr_auc"` or `"roc_auc"`. For multilabel (drug rec) use `"pr_auc_samples"` or `"jaccard_samples"`. For multiclass use `"accuracy"` or `"f1_macro"`. Wrong monitor → checkpoint selection saves the wrong epoch.
5. **MIMIC-IV uses `ehr_root=`, not `root=`.** This is the one inconsistency in the dataset constructors.
6. **For reproducible work, point `cache_dir=` somewhere persistent.** PyHealth caches the parsed dataset; without `cache_dir`, you re-parse every run.
## How to use this skill
PyHealth has a large API surface — there's no point loading it all at once. Read the reference file that matches the user's task:
| If the user is asking about… | Read |
|---|---|
| Installing, env setup, MIMIC access, GPU | `references/installation.md` |
| Which dataset class to use, loading patterns, splitting | `references/datasets.md` |
| What prediction task to choose (mortality, readmission, drug rec, sleep…) | `references/tasks.md` |
| Picking a model architecture, model-specific arguments | `references/models.md` |
| Looking up or cross-mapping ICD/ATC/NDC/RxNorm/CCS codes, tokenizers | `references/medcode.md` |
| End-to-end recipes for common scenarios | `references/examples.md` |
For multi-step tasks (e.g., "build a drug recommendation pipeline on MIMIC-IV"), read `tasks.md` + `models.md` + `examples.md` together — they cross-reference each other.
## A note on style
Write minimal, idiomatic PyHealth. The library is opinionated; lean into its abstractions instead of reimplementing them in raw PyTorch. If you find yourself writing a custom training loop, ask whether `Trainer` would do the job — it almost always will, and it handles checkpointing, logging, and best-model selection for free.
When the user has private MIMIC access, point them at the local CSV root; for demos and learning, the synthetic MIMIC-III bucket (`https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/`) is fine and works without credentialing.
REQUIRED CONTEXT
- user query mentioning PyHealth, MIMIC, clinical prediction tasks, or EHR data
ROLES & RULES
- Models take a SampleDataset, not a BaseDataset.
- Always split by patient (or visit), not by sample.
- Match the task to the dataset.
- Pick monitor to match the task type.
- MIMIC-IV uses ehr_root=, not root=.
- For reproducible work, point cache_dir= somewhere persistent.
- Write minimal, idiomatic PyHealth.
EXPECTED OUTPUT
- Format
- markdown
- Constraints
- follow 5-stage pipeline exactly
- use minimal idiomatic PyHealth code
- enforce critical rules like patient-level splits and SampleDataset usage
- reference specific docs for tasks
SUCCESS CRITERIA
- Use this skill for clinical/healthcare ML involving PyHealth or listed datasets/tasks.
- Follow the 5-stage Dataset → Task → Model → Trainer → Metrics pipeline.
- Read the matching reference file for the user's task.
FAILURE MODES
- Passing BaseDataset instead of SampleDataset to a model.
- Using random sample-level splits instead of patient-level splits.
- Using the wrong task class for a dataset.
- Choosing the wrong monitor metric for the task type.
EXAMPLES
Includes installation commands with uv and a complete 5-stage Python pipeline example using MIMIC3Dataset.
CAVEATS
- Dependencies
- references/installation.md
- references/datasets.md
- references/tasks.md
- references/models.md
- references/medcode.md
- references/examples.md
QUALITY
- OVERALL
- 0.90
- CLARITY
- 0.90
- SPECIFICITY
- 0.95
- REUSABILITY
- 0.85
- COMPLETENESS
- 0.90
IMPROVEMENT SUGGESTIONS
- Add a short section on common error messages and their fixes to increase robustness.
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR AGENT
- DepMap Cancer Dependency Data Analyzeragentmedical
- FDA MedTech Compliance Auditoragentmedical
- FDA MedTech Compliance Auditoragentmedical
- FDA MedTech Compliance Auditoragentmedical
- Pydicom DICOM Medical Imaging Guideagentmedical
- PrimeKG Knowledge Graph Query Skillagentmedical
- PathML Computational Pathology Toolkitagentmedical
- Nurse Skill Description Generatoragentmedical
- Health Assistant Medical Guidance Skillagentmedical
- Claude Ally Health Assistantagentmedical
- Health Medical Analysis Assistant Skillagentmedical
- Comprehensive Codebase Bug Analysis and Fixeragentanalysis
- Xcode MCP Usage Guidelines for Agentsagenttool_use
- Xcode MCP Usage Guidelinesagenttool_use
- Rapid App MVP Prototyperagentcoding
- Local Documentation Online Sync Automatoragentoperations
- HashiCorp Packer Golden Image Expertagentoperations
- Xquik X/Twitter API Integration Skillagenttool_use
- MoltPass Client for AI Agent Identitiesagentsecurity
- AI-First Design Handoff Specs Generatoragentcoding
- Consciousness Council Multi-Perspective Deliberationagentplanning
- Creative Thinking Frameworks for CS Researchagentresearch
- Filesystem Agent Context Engineeringagenttool_use
- Academic Paper Figure Generatoragentresearch
- Multi-Agent Architecture Patterns Guideagentplanning