back to insights page
back to fundamentals page
Thesis
AI
blog
20
August
,
2025
4 mins

Voice AI Thesis: Building defensibility against commoditization

EXPERT
HOST

the gist

1.⁠ ⁠Voice AI has today shifted from pilots to embedded workflows across industries, moving from experimentation to operational reality.
2.⁠ ⁠There is however a risk of commoditisation; as model providers democratise access, defensibility no longer lies in creating these AI agents but in deep workflow integration and domain expertise.
3.⁠ ⁠Product strategies hinge on whether companies build voice agents to replace existing spends to demonstrate quick ROI or create new workflows.
4.⁠ ⁠Successful startups in this space will evolve from narrow point solutions to workflow owners.
5.⁠ ⁠PMF resets with each workflow expansion, while integrations with VoIP and CRM systems create an early moat.

context

Over the past three years, Voice AI has moved from experimentation to operational reality. What was once an area of speculative demos and pilots has now embedded itself into real-world workflows across industries. Healthcare providers rely on AI-powered scribes to capture clinical conversations. Logistics firms use AI agents to give real-time shipment updates. Recruiters conduct initial candidate screenings with automated interview bots. Even financial services and customer support centers now outsource significant volumes of voice interactions to voice bots that answer calls, schedule appointments, and qualify leads.

This shift raises both excitement and critical questions. Will voice AI agents ultimately become a commodity? Will voice AI agents need to be vertical first? What are the key design choices that one will need to make? Is it better to replace an existing budget or to create entirely new workflows? And perhaps most crucially, how does a voice AI agent know it has hit PMF?

To explore these questions, we studied 216 companies, mapped funding flows, examined adoption trends, and spoke with multiple founders. We have come up with a market guide that sheds light on where voice AI stands today, where it is headed, and how founders can build defensible businesses in this rapidly expanding ecosystem.

commoditization risk: why Voice AI agents alone won’t win

The technical barriers to building a functional voice AI agent are collapsing. Not long ago, startups needed to invest millions into proprietary speech recognition and natural language models just to get to a usable product. Today, commercial APIs from providers such as OpenAI, Deepgram, and ElevenLabs have democratized access. A small team can now analyze a set of customer conversations, build tailored prompts, and spin up a working prototype in a matter of days. Accuracy rates often improve dramatically within the first month, making deployment easier than ever.

This accessibility is both a blessing and a curse. On the one hand, it fuels innovation and lowers the threshold for experimentation. On the other hand, it raises the specter of commoditization. If every company is building on the same foundation models, what prevents customers from switching to the cheapest provider? Competing on “better transcription” or “more natural voice” is no longer sustainable.

The companies that thrive will be those that differentiate not at the level of core conversation engines, but in how those engines are embedded into workflows, customized at scale, and continuously improved through feedback loops. Domain expertise and deep vertical integration create stickiness that generic voice bots cannot replicate. Long, context-rich conversations and domain-specific tasks – the kinds that cannot be easily standardized – remain the least prone to commoditization.

the budget question: replace or create?

One of the most fundamental go-to-market decisions in Voice AI is whether to replace an existing budget or create a new one.

Our analysis of 216 companies revealed that 66% target existing spend, such as IVR systems, call centers, or receptionist roles. Another 20% are inventing entirely new workflows, while 13% occupy a gray zone between the two.

Replacing existing budgets allows Voice AI companies to demonstrate ROI more quickly. Since customers already acknowledge the problem and dedicate resources to it, adoption is easier and sales cycles are shorter. Companies that position themselves in these spaces succeed by delivering the same outcomes at lower cost, higher speed, or greater accuracy, sometimes even layering in a services component to boost reliability. Common examples include after-hours call handling, outbound lead qualification, and automated appointment confirmations in sectors like healthcare and automotive.

By contrast, creating a new workflow often involves slower adoption. It requires educating customers to create new budgets. The payoff, however, can be significant. Companies like Domu (AI-driven landlord – tenant communication) and Happy Robot (AI “worker” for freight brokers) illustrate how entirely new workflows can reshape industries.

from point solution to workflow owner

The evolution of Voice AI companies often begins with narrow, highly targeted solutions. Many startups in this space emerge as point solutions, addressing a single and urgent pain point. This approach offers a clear entry into the market, as customers can easily grasp the immediate ROI. Yet the long-term trajectory of the most successful firms involves moving beyond that initial wedge to become workflow owners – products that embed themselves so deeply into their customers’ workflows that they effectively shape and control the processes themselves.

Two archetypes tend to dominate this strategy: workflow initiators and communication interfaces. Winners start as initiators and eventually expand into adjacent workflows

Some companies begin as workflow initiators, kicking off valuable processes that set other tasks into motion. A common example is the AI scribe, which generates structured notes that then trigger follow-up actions like scheduling appointments. Others emerge as communication interfaces, offering voice as an alternative input method for existing applications. While both approaches can gain traction, the “current” winners in this market usually begin as initiators and then expand outward into broader ownership of the current or adjacent workflows.

The trajectories of companies such as Mercor and Abridge illustrate this pattern well. Mercor began as a tool for conducting AI-led interviews, solving an immediate problem for recruiters. Over time, the company layered on scheduling functionality and eventually moved into managing candidate contracting end-to-end. Abridge followed a similar arc in healthcare. Initially positioned as a physician’s scribe, it expanded into scheduling follow-ups and now tackles revenue cycle management through medical coding. In both cases, the companies did not remain confined to their wedge solution; instead, they moved deliberately downstream into processes adjacent to their starting point, embedding themselves more deeply into their customers’ workflows.

The growth pattern that emerges is consistent across successful Voice AI firms.

In Act One, a company introduces a wedge product that demonstrates clear, immediate ROI. In Act Two, it ties downstream automation to that wedge, expanding its utility and value. In Act Three, it achieves deep integration with core systems, raising the costs of switching and ensuring long-term defensibility. This transition - from point solution to workflow owner - marks the difference between tools that can be easily replaced and enduring platforms that become indispensable.

horizontal vs. vertical strategies

While this evolution unfolds, voice AI companies also face strategic choices about where to focus their efforts: horizontal problem statements, or vertical focus. Historically, horizontal applications such as customer support have captured around 57% of all voice AI funding. Healthcare has accounted for about 24%, while infrastructure tools have received 11%, and other verticals such as finance, logistics, insurance, and restaurants collectively make up the remaining share.

Yet vertical focused opportunities are increasingly compelling. A vertical-first approach allows companies to solve domain-specific problems in a voice-first way, creating solutions that generic platforms cannot easily replicate. In these niches, voice AI can unlock bottlenecks tied to human labor and process inefficiency. Just as importantly, vertical solutions can accumulate specialized datasets that form defensible data moats, making each customer deployment smarter and more accurate over time. Over the long run, these products can evolve into the system of record for their industry, positioning themselves as critical infrastructure rather than optional add-ons. 

measuring PMF in Voice AI

Product-market fit in Voice AI is not a single milestone. Each time a product expands into a new workflow, the PMF test resets. An AI scribe that nails note-taking must again prove its value when it moves into coding or scheduling.

Signals to track include:

  • Time to value: Does ROI appear within days or weeks?

  • Usage patterns: Is adoption sustained beyond pilots?

  • Revenue model fit: Is per-usage pricing sustainable, or does volatility creep in?

  • Budget retention: Are customers moving from trial budgets to long-term commitments?

The insight is simple: PMF in Voice AI is dynamic. Founders must treat it as a moving target rather than a static milestone.

integration as a defensible moat

Finally, integration emerges as one of the most important moats in Voice AI. Early integration decisions often determine whether a product will remain peripheral or become central to a customer’s operations. Two types of integration dominate adoption. The first is with VoIP systems such as GoTo and Dialpad, which are notoriously difficult to connect due to legacy design but essential for handling voice communication at scale. The second is with CRM and scheduling systems, which are fragmented but unavoidable for any serious workflow automation.

Once a Voice AI product embeds into both telephony and CRM systems, switching costs rise sharply. What begins as a replaceable tool becomes mission-critical infrastructure. Integration, therefore, is not just a technical choice but a strategic one - turning adoption into long-term defensibility.

In conclusion, these dynamics reveal the contours of success in Voice AI. Startups that begin with sharp, narrow solutions must quickly find pathways to broaden their role, deepen their integrations, and build data moats that shield them from commoditization. They must navigate the trade-offs between horizontal reach and vertical depth, and they must recognize that PMF is never static but always a moving target. Ultimately, the companies that thrive will be those that embed themselves so completely into workflows that their removal would feel impossible.

If you're building a company in voice AI we’d love to hear from you. Please reach out to us at sayantan@stellarisvp.com.

insights

BLOG
From Point-and-Shoot to DSLR: Potential for compounding intelligence for LLMs via RL
BLOG
What it takes to win in the AI-for-developers space
BLOG
AI-enabled services: Emerging opportunities in a hybrid software-services landscape
BLOG
Gold look-alike jewelry at 1/10th or 1/20th the price: Thesis
BLOG
Vertical Quick Commerce for Pharmacy
BLOG
From Point-and-Shoot to DSLR: Potential for compounding intelligence for LLMs via RL

insights delivered straight to your inbox

*Click the checkbox below to enable it
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.