Available for Applied AI roles

I ship production LLM systems
in healthcare.

Research Software Engineer specializing in clinical NLP. 3-week concept-to-production cycles. Pipeline speedups measured in orders of magnitude. Code running in clinical trials.

Get in touch View resume

700×

Pipeline speedup
120 days → 4 hours

17K

Pathology reports
100% schema validity

3 wk

Concept to production
now in clinical trial

90%

PHI detection F1
via synthetic data

Featured Work

Constrained Decoding • 2025

CRANE-style Structured Medical JSON

Built constrained decoding pipeline converting free-text pathology reports to schema-valid CAPeCC JSON. Implemented free-reason → switch-token → JSON window generation pattern. Diagnosed and patched Outlines → llama.cpp logits incompatibility breaking FSM-based token masking. Ported from Hugging Face to llama.cpp (GGUF) for large-model inference on A100s.

View the repo →

100% schema validity

17K reports approved

<1 min target latency

Clinical RAG • 2024

Caring Contacts — Psychiatric Discharge

Collaborated with Psychiatry to build RAG system generating personalized post-discharge hope letters from patient charts. Proposition-level chunking + two-stage retrieval (BM25 → Qwen reranker) with clinician-defined safety guardrails. Shipped concept to production in 3 weeks.

3 weeks to production

ISBD 2025 poster presentation

Active clinical trial

Writing

All posts →

Oct 2025 Building a ML Pipeline for Microcalcification Classification in OMOP Sep 2025 On Augmented Data: Synthetic PHI for De-identification Jun 2025 Road to SOTA: Building a PII Detection Model Jun 2025 RAG Experiments: Chunking, Retrieval, Reformulation

About

I'm a Research Software Engineer at Sunnybrook Research Institute in Toronto, where I'm the sole NLP specialist building production AI systems for clinical applications.

My work sits at the intersection of LLMs and healthcare: constrained decoding for structured report generation, RAG systems with clinical safety guardrails, and large-scale PHI de-identification pipelines.

Background in biomedical engineering. I learn by building — typically implementing research papers within days of reading them. I optimize for asymmetric upside and ship fast to learn fast.

LLMs & NLP

Hugging Face Transformers, llama.cpp, RAG pipelines, Constrained decoding (Outlines/XGrammar/GBNF), Knowledge distillation, Model quantization

Infrastructure

FastAPI, Docker, AWS/Azure, Postgres, FHIR/OMOP

Core

PyTorch, Python, Pydantic, SQL

Let's talk

Looking for Applied AI / Research Engineer roles at top-tier labs.

zardar.khan@icloud.com LinkedIn GitHub