Your Doctor’s AI Can’t Read Your Medical Record.
by Serelora
This article was originally published on Medium.
Read full article on MediumThat’s a Bigger Problem Than You Think.

By Luis Cisneros, CEO of Serelora
Imagine you walk into a hospital. You’ve been alive for 55 years. You’ve seen dozens of doctors. You’ve had blood drawn, prescriptions filled, surgeries scheduled, insurance claims filed, imaging done. All of that information, every note a doctor ever scribbled, every lab result, every claim your insurer processed, lives somewhere in your electronic health record.
Now here’s the part nobody tells you.
That record is roughly the length of Moby Dick.
Not a pamphlet. Not a summary. A novel. Sometimes two novels. The average patient’s longitudinal health record runs between 450,000 and 500,000 tokens. Patients with complex histories at major hospitals? One to two million tokens of raw medical text. And when a hospital tries to point an AI at that record to help your doctor make a decision, the AI chokes. Not gracefully, not partially. It chokes the way anyone would if you handed them two thousand pages and said “read this in ten seconds and tell me what’s wrong with the patient.”
The dirty secret behind every hospital AI pilot program
The technology behind ChatGPT and every other AI assistant making headlines has a fundamental constraint that rarely comes up in the breathless coverage. It’s called a context window. Think of it like the AI’s short-term memory. The biggest models today can hold somewhere between 128,000 and two million tokens in that window at once, which sounds like a lot until you realize your medical record is already pushing those limits before the AI has even started thinking.
So the system does what any overwhelmed reader would do. It skims. It summarizes the summary. It cuts corners. It drops the lab result from three years ago that actually explains why your kidneys are struggling today. It loses the temporal thread, the story of you over time, and starts guessing to fill the gaps.
Those guesses have a name in AI research. They’re called hallucinations. And in medicine, a hallucination isn’t a quirky chatbot moment. It’s a misdiagnosis. It’s the wrong drug. It’s a missed cancer screening. This is the context-window crisis, it affects virtually every hospital system experimenting with AI-assisted care right now, and almost nobody is talking about it honestly. The industry keeps reaching for bigger models with longer windows, as if the solution to drowning in text is a bigger swimming pool.
But what if the problem isn’t the size of the pool? What if it’s that we’re swimming in the wrong ocean entirely?
What if the AI didn’t have to read at all?
That’s the question that changed everything for us at Serelora. And it’s so simple it almost sounds naive. Why are we asking AI to read novels when medicine already has its own language?
Think about it. Doctors don’t actually communicate in paragraphs when precision matters. They use codes. When your doctor says you have Type 2 diabetes, somewhere in the system that becomes ICD-10 code E11. Your blood glucose test? That’s a LOINC code. Your metformin prescription? RxNorm. The procedure they billed your insurer for? CPT-4. The clinical description of what’s actually happening in your body? SNOMED CT.
These codes already exist. They’ve been standardized for decades. They’re universal across every hospital, every insurer, every country that practices modern medicine. They’re precise. And perhaps most importantly for the problem we’re solving, they’re tiny.
So the insight behind Serelora was almost stupidly elegant. Instead of feeding the AI your entire Moby Dick-length medical record in plain English, translate the whole thing into clinical codes first. Store it that way. Let the AI think in codes, reason in codes, and only translate back into human language when it needs to talk to your doctor. That 450,000-token record collapses to somewhere between 50,000 and 150,000 tokens. A 70 to 85 percent reduction, not by deleting information, but by saying the same things in a language that was designed to be compact and unambiguous.
It’s the same leap mathematics made centuries ago. You can describe the trajectory of a cannonball in three paragraphs of flowery English, or you can write one equation. The equation isn’t a simplification. It’s a better representation of reality. Clinical codes do the same thing for medicine. And once you accept that premise, the entire architecture of the system follows naturally.
From chaos to codes, one patient at a time
It starts with the mess. PDFs, scanned notes, insurance forms, lab reports, discharge summaries. The unstructured chaos of a real medical file, the kind that lands on a desk and makes a physician sigh. The system ingests all of it through a secure upload pipeline with multimodal extraction, meaning it can read typed text, interpret scanned images, and pull structured data from forms simultaneously, regardless of format.
From that raw chaos, normalization begins. Every medical concept the system extracts gets mapped to its canonical clinical code. A diagnosis becomes an ICD-10 code. A medication becomes an RxNorm identifier. A lab observation becomes a LOINC code. A procedure becomes a CPT-4 code. The clinical concept itself, the actual meaning, gets anchored to SNOMED CT. Custom extensions handle the inevitable edge cases, but the principle is absolute. Every piece of information gets one unambiguous code. No synonyms. No paraphrasing. No ambiguity.
And then something that might sound radical happens. The narratives go away. Once every concept has been coded and mapped, the original prose is no longer the source of truth. The data lives as nodes and edges in a structured knowledge graph, with clinical concepts as nodes connected by typed relationships. Temporal relationships capture what happened when. Causal relationships capture what caused what. Administrative relationships capture what was billed and what was covered.
This is the compression engine. This is how Moby Dick becomes a short story without losing a single plot point. And critically, when the AI later needs to turn a code back into a medical concept, it isn’t guessing. Decompression is deterministic and lossless. One code, one meaning, every single time. There’s no generative step, no probabilistic interpretation, no room for the AI to improvise its way into an error.
But compression alone, even compression this dramatic, doesn’t fully solve the problem. A compressed knowledge graph is still a vast landscape of interconnected clinical information. Reasoning across it all at once would still push the boundaries of what any single AI process can handle reliably. Which is why Serelora doesn’t try to do it all at once. It unleashes the gremlins.
Six gremlins, four agents, one coherent answer
We call them gremlins. Six parallel modules that tear through a patient’s knowledge graph simultaneously, each one devouring a different slice of the record. We finalized this architecture just this week.
One gremlin handles medications, mapping every prescription through RxNorm and flagging interactions. Another tracks labs and vitals over time, using LOINC codes to catch trends a human might miss across years of scattered data. A third processes the qualitative assessments doctors have written in clinical notes. A fourth chews through the insurance landscape, what’s covered, what’s denied, where financial leakage is hiding. A fifth works in ICD-10 and SNOMED CT to map the full diagnostic journey and flag condition trajectories. And a sixth digs into social determinants of health, the non-clinical factors like housing stability, employment, and food access that shape outcomes as powerfully as any prescription.
Each gremlin sees only its slice. None is overwhelmed. None is skimming. And because they’re all running at the same time rather than sequentially, the system processes a patient’s entire clinical history in a fraction of the time it would take a single process to wade through even the compressed version.
But gremlins don’t make decisions. They decompose and retrieve. The reasoning happens one layer up, across four specialized agents.
The Graph RAG Retrieval agent pulls structured information from the knowledge graph in response to queries, ensuring that every piece of data the system reasons about is anchored to its source in the graph rather than hallucinated from general training. The Clinical Reasoning agent handles differential diagnosis, treatment planning, and risk projection, including Gompertz-curve modeling that shows how different interventions might alter a patient’s health trajectory over time. The Administrative agent translates clinical intent into reimbursement logic, maps procedures to billing codes, identifies leakage, and navigates the liability landscape. And the Orchestrator sits above all three, reconciling their outputs, running cross-agent consistency checks, resolving conflicts, and synthesizing everything into one coherent narrative tailored to whatever question was actually asked.
That is the architecture in full. Six gremlins decomposing the record in parallel. Four agents reasoning across different domains. An Orchestrator that brings it all together only when needed, only for the specific context being queried. The complete picture never has to exist in a single context window at any point. The system never drowns because it never tries to drink the whole ocean at once.
That alone would be a meaningful advance. But in medicine, a correct answer that can’t be verified is almost as dangerous as a wrong one. Which brings us to what might be the most important piece of the entire architecture.
Two receipts for every answer
Every recommendation Serelora surfaces to a physician comes with dual provenance. Two kinds of proof, visible by default, not hidden in a settings menu.
The first is document provenance. Click on any sentence in the AI’s output and the original source document pops up. Not a summary of the document. Not a paraphrase. The actual document, with the exact relevant passage highlighted. You see where the information came from the same way you’d check a footnote in a research paper, except the footnote takes you directly to the primary source rather than making you hunt for it.
The second is guideline provenance. Every treatment suggestion, every administrative recommendation, every diagnostic consideration appears alongside the specific evidence-based clinical guideline the AI used to reach that conclusion. If the system recommends a particular screening protocol, the AHA or NCCN or CMS citation is right there, not buried in a methodology section somewhere, but right next to the recommendation where the doctor can evaluate it in real time.
This dual layer is what separates a tool doctors will actually trust from one that collects dust in the IT department. It’s also what natively satisfies regulatory requirements, malpractice documentation standards, and reimbursement audit trails, not as compliance features bolted on after the fact, but as consequences of how the system was designed from the ground up.
And because every piece of provenance traces back through the same clinical code substrate, the system can do something else that has historically been one of medicine’s most expensive unsolved problems.
Medicine’s split personality, and why codes finally bridge both sides
Medicine has always lived in two worlds simultaneously. There’s the clinical world, where doctors think about what’s wrong with a patient and how to fix it. And there’s the administrative world, where insurers think about what’s covered, what’s billable, and what documentation is required to justify a claim. These two worlds describe the same patient in different languages, and the translation gap between them costs the American healthcare system billions of dollars a year in denied claims, administrative overhead, revenue leakage, and liability exposure.
When your entire substrate is built on standardized clinical codes, that translation becomes native for the first time. The same ICD-10 code that describes a patient’s condition maps directly to reimbursement categories. The same CPT-4 code that describes a procedure maps to billing requirements. There’s no “interpretation” step where an AI reads a doctor’s note and hopes it captures the right billing nuance. The code is the nuance. Clinical intent and administrative logic finally share a common language because the language was always there, just never used as the foundation for reasoning itself.
This is also where Serelora’s RAF scoring layer comes in, a retrieval-augmented fine-tuning approach we finished integrating this week that grounds every piece of reasoning in domain-specific, retrieval-verified knowledge rather than general-purpose pattern matching. Alongside it, an advanced scribe capability produces physician-ready notes with inline citations, turning the AI’s analysis into documentation that’s immediately usable rather than a draft that needs another hour of human cleanup.
Together, these capabilities make the treatment planning work in a way that feels qualitatively different from anything else in the market. The Clinical Reasoning agent returns ranked differential diagnosis options with projected risk trajectories, Gompertz-curve modeling that shows how different interventions might alter a patient’s health arc over time. It doesn’t just say “here are three options.” It lays out the evidence for each, shows how each one changes the long-term picture, and the Administrative agent maps the pathway to make each one actually happen. Clinical reasoning and operational execution, unified through the same code substrate, delivered with full provenance on both sides.
Which brings us back to the three problems everyone said couldn’t be solved at the same time.
Three problems that were supposed to be unsolvable
The technical literature calls it the context-compute-hallucination trilemma, and every serious effort in healthcare AI has treated these as tradeoffs to be managed rather than problems to be eliminated.
The context problem is that your medical record is too big for AI to hold in working memory. Clinical code compression reduces the raw token count by 70 to 85 percent, and the gremlin architecture means each parallel module only needs a fraction of what remains. A gremlin handling medications never sees the insurance documents. The full picture only gets assembled by the Orchestrator at synthesis time, and only for the specific question being asked.
The compute problem is that processing millions of tokens is expensive, because compute scales with the square of input length in traditional transformer architectures. Cutting tokens linearly cuts costs dramatically, and running six gremlins in parallel on modern orchestration frameworks approaches linear speedup. Less data, processed simultaneously, means radically cheaper inference at every scale.
The hallucination problem is that AI fabricates information when it’s uncertain. But when an AI can only reason about concepts that exist in a verified clinical ontology, when decompression from code to concept is deterministic rather than generative, when the Graph RAG agent ensures every retrieval is anchored to the knowledge graph, and when every output must point back to both a source document and a clinical guideline, the space for fabrication doesn’t shrink. It collapses. Published research shows 50 to 80 percent reductions in hallucination rates with knowledge-graph grounding alone. Serelora bakes that grounding into the substrate itself, into the very language the AI speaks.
These three problems aren’t actually a trilemma. They’re three symptoms of one underlying mistake, which is trying to make AI think in a language that was never designed for machine reasoning. Fix the language, and all three resolve together.
The answer was never a bigger brain
We are in a strange moment in healthcare AI. The technology is powerful enough to help. The data exists to make it work. The clinical coding standards that make all of this possible have been around for decades. And yet most hospital AI deployments are still trying to brute-force their way through medical records like a college student cramming a textbook the night before an exam. Reading everything, remembering almost nothing, and making things up when the details get fuzzy.
The answer was never bigger models with longer context windows. It was a better language. A language medicine already invented but never taught its machines to speak.
Anchored clinical codes as the native vocabulary for AI reasoning. A knowledge graph that compresses Moby Dick into a short story that loses nothing. Six gremlins tearing through the record in parallel so no single process ever drowns. Four agents reasoning across clinical, administrative, and retrieval domains. An Orchestrator that reconciles it all into one coherent answer. Dual provenance that lets every doctor verify every answer against both the source material and the clinical evidence, right there on screen. RAF scoring that locks reasoning to retrieval-verified knowledge. And a translation layer that finally bridges the gap between what a clinician means and what an administrator needs, because both sides were always speaking the same language underneath.
The architecture is live. The gremlins are running. The agents are reasoning. The provenance clicks through to real documents and real guidelines. And for the first time, the AI isn’t trying to read your medical record at all.
It’s speaking it.
RELATED ARTICLES
Explore more insights and perspectives from our team.


