Saran Siva is the co-founder and CTO at Insight Health.
The opportunity in voice AI for healthcare is huge. The global AI voice agents in healthcare market is projected to grow from roughly $468 million in 2024 to more than $3 billion by 2030.
Clinics are dealing with staffing shortages, appointment backlogs and a system buckling under administrative weight. Physicians report spending only 27 hours per week on direct patient care out of a nearly 58-hour work week, with the remainder consumed by documentation, order entry and administrative tasks. In a survey my company sponsored with MGMA of 302 medical practice leaders, 59% of practices reported handling more than 300 inbound calls per business day, and more than one in three said they miss 11% or more of calls at peak times.
Voice AI can change that math, but only if patients actually engage with it. Right now, that’s not a given. Research found that patients may be skeptical of voice AI in healthcare due to prior experiences with spam calls, robocalls and poorly functioning chatbots that have become commonplace in our world today.
We’ve seen this pattern before. A new modality arrives, the demos look great and the industry moves fast. Then, real people start using it, and the trust issues surface at once. The question isn’t which voice AI platform will win, but what actually earns a patient’s trust in the first place.
As a CTO, the trust challenge is what my team and I work on every day. The conclusion we keep coming back to is that the trust is a product design problem as much as a technology one.
Latency is a trust signal.
In chat or email AI, a two-second delay is unremarkable. In voice, it can be a conversation-ender. Silence on a phone call doesn’t read as “the system is thinking.” It reads as broken or incompetent, which a patient calling about a health concern cannot afford to feel.
Voice AI has to be both instant and smart, which pushes today’s models to their limits. The answer is to design around that constraint from day one.
One approach that I’ve found works well is conversational fillers and micro-responses that acknowledge the patient while reasoning runs in parallel. Generic filler (“Mm-hmm, one moment”) can erode trust as fast as silence, because it reveals the machinery. Filler that works should acknowledge the specific thing the patient just said and transition seamlessly into the answer.
For example, if a caller asks to reschedule, “Got it, looking at next week’s openings” can land first while the scheduling lookup runs against the EHR. When someone calls about a symptom that started that morning, “Sorry to hear that, let me see who can see you today” can go out ahead of the triage check. The acknowledgment fits the moment, and the system gets the half-second it needs to think.
The demo-to-production gap is huge.
Every voice AI product sounds polished in a controlled demo. Demos are scripted, the audio is clean and the conversations stay on-rails. But a production environment is different. Patients call from noisy environments, talk over the system, go on tangents and don’t finish sentences. They’re sometimes confused, sometimes anxious and sometimes in pain.
Building sustainable trust requires the agent to identify itself as AI when asked, refuse to improvise on anything it can’t verify and perform consistently across thousands of calls a day.
The gap between demo and production is where most voice AI deployments struggle and where patient trust is won or lost. Teams that underestimate it ship something that performs beautifully in test and breaks the first time a patient doesn’t follow the script.
Specialized systems are needed, not general ones.
One pattern that I’ve seen work is routing each turn of a conversation to specialized agents running in parallel rather than asking one model to do everything. Think of it less like one AI on the line and more like a small team of focused specialists, each good at a particular kind of reasoning.
In practice, every turn fans out to focused workers running side by side: scheduling against the EHR, eligibility verification, patient matching, intent classification and a safety check for clinical escalation cues. After the call ends, another set runs in parallel: structured summary, urgency, chief complaint, transfer reason and routing tags. Each worker is small enough to reason about on its own, easy to evaluate in isolation and easy to swap when a better model comes along.
When something breaks, it breaks in a defined way, not in a way that cascades unpredictably through the whole system.
Personalization is the long game.
Patients tend to be more comfortable with AI when it frees up doctors for personal interactions rather than replacing them, and they want AI that supports better outcomes and reduces errors. Voice AI that feels impersonal will always fight an uphill battle for engagement.
Personalization in voice AI isn’t just about using someone’s name. It’s about whether the system remembers prior interactions and adapts to how a patient actually speaks. When someone calls back about a concern they raised last week, the AI should pick up where the prior conversation ended rather than starting from scratch.
Research found AI-labeled advice is rated less reliable and less empathetic than advice labeled as coming from a human, and participants are less willing to follow it. Closing that gap requires building systems that feel human, not systems that merely sound human.
The trust problem is ours to solve.
Sixty-eight percent of U.S. adults fear that AI could weaken the patient-provider relationship, leading to less human interaction in healthcare. Patients aren’t opposed to voice AI in the abstract, but they are opposed to experiences that remind them of the automated phone scammers they’ve been fighting with for years. Build something that feels different, and the trust is more likely to follow.
The builders who get this right won’t just win on performance benchmarks. They’ll build the systems patients actually want to use, and that’s where the real healthcare impact lives.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

