Making clinical AI trustworthy through knowledge architecture & reasoning

“Olivia” Zheng Milgrom
“Olivia” Zheng Milgrom, MD, MPH

Combining the best of LLMs and symbolic semantics to produce explainable clinical knowledge — at the speed of AI, with the rigor of curated expert knowledge.

I design and ship knowledge architecture and evaluation frameworks at enterprise scale — catching where AI drifts from domain truth, working across healthcare stakeholders, engineering teams, and end users.

Clinical Knowledge Architecture LLM Grounding & Evaluation Neuro-Symbolic AI Semantic AI, Knowledge Graphs & Formal Reasoning MCP Tool Building Enterprise AI — From Design to Production Explainable AI & Visual Reasoning Design Clinical Background (MD) National Library of Medicine Informatics Fellow Recognized Expertise Across Industry & Academia
Problems that keep showing up when AI meets high-stakes domains

LLMs are powerful, but in high-stakes areas like healthcare, sounding right isn’t the same as being right — and even when AI can explain itself, that doesn’t mean experts can verify it.

🔍

When Retrieval Misleads

The model finds a concept that looks like a match based on word overlap, but means something clinically different. Keyword similarity masks semantic divergence.

🌀

Semantic Drift

The answer starts in the right clinical territory but gradually shifts meaning through plausible-sounding steps — ending somewhere subtly wrong.

Read my case study →
💭

Hallucination

The model generates clinical codes, classifications, or relationships that don't exist. Without a formal knowledge base to check against, fabrications are invisible.

🧩

The Explainability Gap

AI can show a reasoning path, but domain experts need to verify, explore, compare, and generate hypotheses — not just read explanations. That’s what it takes to keep the AI accountable.

Bidirectional grounding — reasoning built in, not patched on

Designing systems where clinical knowledge and AI inform each other continuously — not as a pipeline, but as a layered architecture.

Human
LLM
semantic interpreter
+ hypothesis generator
Knowledge Layers
How the domain is defined · Ontology
How things relate · Knowledge Graph
What must hold · Rules & Constraints
Reasoning
knowledge computation
+ verification
Human
visual reasoning
+ hypothesis exploration
01

Layered Knowledge Architecture

‘RAG is dead’ — when voices across the industry say this, what they mean is that store-and-retrieve was never enough.

Even knowledge graphs — now the fastest-rising trusted context layer for GenAI in enterprise AI (according to Gartner) — are only one layer. I specialize in the deeper conceptual infrastructure where clinical logic can be proven, not just looked up.

02

Hybrid Retrieval + Reasoning Loop

Most AI systems stop at retrieval; others still rely on black-box machine learning. Neither LLM nor symbolic reasoning alone is sufficient. I design hybrid approaches where clinical knowledge and AI inform each other continuously.

03

Automatic Evaluation at Scale

Every answer traced back to its contexts and original intent — built to be:

Configurable Auditable Explainable Enterprise-scale
04

Visual Reasoning & Sensemaking

Explainability, taken even to its visual form.

A paradigm shift in how auditing happens — empowering domain experts to go from open-ended investigation to focused hypotheses.

Surface inconsistencies Expose assumptions Explore hypotheses
Writing, speaking, and presenting

A domain expert, builder, and designer. An advocate for trustworthy clinical AI.

Olivia Milgrom interviewed at HIMSS AI Pavilion by Healthcare IT Today
Healthcare IT Today Mar 2026 HIMSS26 AI Pavilion

Sharing the Vision: Combining LLMs and Symbolic Semantics for Producing Explainable Clinical Knowledge at Scale

Watch on X → View on LinkedIn →
Olivia Milgrom behind the scenes at HIMSS24 Healthcare IT Today interview
Healthcare IT Today Mar 2024 HIMSS24 AI Pavilion

Interview: Why Knowledge Management Should Be the #2 Topic After AI

When the industry was just beginning to grapple with LLMs in healthcare — on why high-quality clinical knowledge management is the foundation for reliable AI models.

Watch on X → View on LinkedIn →
Olivia Milgrom presenting her poster at AMIA 2025 Annual Symposium
American Medical Informatics Association (AMIA) Nov 2025 Atlanta, GA Poster Presentation at Annual Symposium

Knowledge Graph for Propositional Reasoning: A Multi-Case Study of CCSR (Clinical Classifications Software Refined) for ICD-10-PCS (Procedure Coding System)

When classification rules (inclusion criteria) quietly break, how do you find out? Using knowledge graph reasoning to surface three types of logical inconsistency — the kind conventional methods miss.

View poster →
Olivia Milgrom presenting at AMIA 2025 Annual Symposium
American Medical Informatics Association (AMIA) Nov 2025 Atlanta, GA Podium Presentation at Annual Symposium

Hybrid Retrieval and Reasoning to Predict Procedure Codes from Key Terms Using Knowledge Graphs and LLMs

Exploring six hybrid approaches that combine knowledge graph traversal with LLM capabilities — showing when structured retrieval before generation outperforms LLMs alone.

Medium Dec 2025 4 min read

A ‘Semantic Drift’ Perfect Storm: How Medical AI Chatbots Get It Wrong

A real-world case study dissecting how a medical AI chatbot confidently gave wrong dosing advice for dialysis patients.

Featured in AI 101: A Self-Paced Guide to AI in Medicine, a resource for physicians

Read on Medium →
Where the Expertise Comes From

The full evidence-based medicine cycle — lived, practiced, and now built into AI

Learning Health System cycle: Bedside Care, Real-World Evidence, Evidence Synthesis, Knowledge Dissemination, Closing the Loop — all feeding into Knowledge Architecture and AI at the center
Growing minds — including my own

My Unconventional Path

Growing up, I was drawn to two things — art and math.

Drawing, calligraphy, stage performance… and the satisfaction of a proof that holds without a crack.

My math teacher told me I was the best student he’d ever had, and he believed I could make a difference if I stayed with STEM.

But I wanted something totally practical — something that would make a tangible difference in people’s lives. I chose medicine.

While navigating the unsustainable realities of medicine, I had my aha moment: encountering the work of Yuval Noah Harari, I saw how technology reshapes humanity and realized the power of combining clinical knowledge with AI, before large language models arrived.

AI fails not for lack of data, but for lack of understanding.

Philosophy — the questions it asks, and the ontological frameworks it clarifies — is the key to fixing it.

As MIT Sloan put it, ‘Philosophy eats AI.’

Educating the Next Generation

Teaching kids to tell fake from true — in art and in AI, the same question applies.

I bring that question to elementary school kids: can you tell what’s real and what’s AI?

🤖

AI Dance Room

Real or Fake Animals? You Guess!
🐾 🐷 🕊 🐾 🐷
🔍 You Are AI Detectives!

▶ Play ‘I Need to Zone Out (and Dance Now)’ — Real or Fake?

👀 Round 1 0:00 – 0:30 Real or fake animals?
👀 Round 2 0:30 – 1:00 Real or fake animals?
👀 Round 3 1:00 – 1:30 Real or fake animals?

🎤 Bonus: Who Sang This Song?

A) Olivia Milgrom - me
D) The principal from your elementary school
B) AI / Robot
E) Rumi from K-pop Demon Hunter
C) Taylor Swift
F) The pelican/bird in the video

“I Need to Zone Out (and Dance Now)”

I made this song and music video — and played it at an elementary school festival where kids tried the guessing game above. ▶ Press play the music video and try the guessing game yourself!

Robot Dance Performance

If you have guessed correctly, a real robot would dance to the song — showing kids how AI, music, and robotics connect.

Let’s talk about trustworthy clinical AI

Whether you're working on clinical AI, exploring how knowledge representation can improve LLM reliability, or just curious about this space — I'm always happy to connect and exchange ideas.