Writing
Technical deep-dives on RAG, clinical NLP, constrained decoding, and production ML systems.
Building a ML Pipeline for Microcalcification Classification in OMOP
A technical walkthrough of an end-to-end ML pipeline that classifies radiology reports and integrates structured findings into an OMOP database.
On Building Augmented Datasets: A Practical Case Study
Why purely synthetic data failed for medical NER, and how augmenting real PHI with GPT-4 generated context achieved 75% F1 score.
Road to a SOTA PII Model
Yes, I’m using PHI and PII interchangeably. Yes, I know they’re not the same. No, I don’t care. It’s my blog.
Using Decoder-Only LLMs for PHI De-Identification: A Minimal Setup
This was originally a notebook I built for a summer student. The goal: test if decoder-style LLMs can help extract PHI from clinical text — and wrap their output with simple postprocessing to recov...
RAG Experiments: Chunking, Retrieval, Reformulation
Chunking, hybrid retrieval, and reformulation strategies for clinical RAG pipelines, including Dense X Retrieval and proposition-based methods.