Speech recognition: Our view on how the market will evolve for enterprise and consumer use cases
Alexa – Please repeat my standard grocery order
OK Google – Play the latest Arijit Singh songs
Siri – Is it raining in Mumbai today?
Such commands have become a common part of our day to day life. But it doesn’t just stop here, we were far more intrigued when we found out that our office boy gives commands in Hindi to search for his favorite video on Youtube and our household help uses only WhatsApp audio feature to communicate because she doesn’t know English and Kannadiga/Hindi keyboards are difficult to use. Speaking is more natural, more intuitive, more human. Increasing smart speaker’s penetration, voice commerce via Alexa and high adoption of voice search in vernacular user-base show significant value-add of the speech interface to consumers.
On the technology front, access to a large amount of data, digital speech print of millions of consumers and the advancement of AI are key drivers of high levels of accuracy in speech recognition. The ability of voice assistants to fulfill various tasks has crossed the threshold for large scale acceptance; especially in navigation, search and command with commerce closely catching up. In other use cases, the ability only gets better with data.
This technology advancement coupled with the convenience of speaking has made it possible to build intelligent actionable insights using voice.
New voice-based platforms competing with Google and Amazon will be hard to create
Globally, voice technology development has been led by big tech companies – Google, Amazon, Microsoft, and Apple in the US and Baidu, Tencent & Alibaba in China. Google offers Speech to text (STT) APIs in 130 languages (12 Indian) with fairly high accuracy for the general purpose use case. Google also gives flexibility to the user to upload their own vocabulary and training data for better results in a more specific use case. Alexa provides 80000+ skills to developers to create Alexa apps of which the most popular are in gaming, music, lifestyle. A number of Indian players have also built platforms with their in-house ASR and NLP (Automatic Speech Recognition and Natural language programming) engine for vernacular language communication.
The problem of understanding and processing speech is no longer as hard as it was earlier with increasingly powerful NLP models available in the open-source. It takes a few months to build an in-house ASR engine for a new language today. We can reasonably conclude that speech recognition technology has been commoditized by the likes of Google and Alexa. Building a platform requires a huge amount of trust and long term visibility, hence it would be difficult for a new player to compete with the global tech giants in this category.
The commoditization of the speech technology, however, opens an entirely new universe of opportunities for applications, both B2B and B2C.
There is huge opportunity in creating B2B Saas applications in areas where voice conversations are core to the business
- All our agents are currently busy? Please be on the line
- Ever got sales call from insurance agents speaking without context?
- Ever spoken with the customer support and explained the entire problem again because the previous one you spoke with was on a different shift?
and the list goes on…
We speak with businesses on a daily basis. Sometimes multiple times a day. There are 2.6M inside sales professionals in the US alone making atleast 50M sales calls per day. According to Forbes, after-sales customer support is a $350B industry in itself. Even a small productivity improvement can lead to significant financial benefits for the businesses and a much better experience for their customers
The Stellaris team further applied the following 2 lenses to categorize the B2B consumer-facing applications:
i) Real-time vs Non-real time: Real-time suggestions provides far more value than post-call analytics
If you are a business doing inside sales, imagine a coach with each of your sales agent which can give suggestions, feedback, and facts during the call in real-time. To top it up, all these coaches communicate with each other and can replicate the performance of your best sales agent across the floor. The results can be massive. Higher accuracy in speech recognition and advanced AI can make real-time suggestions, coaching, cross-selling and up-selling possible just by listening to the conversation with the customer and making sense of it. Though it is a much harder problem to crack from a technology standpoint but it adds significantly more value to the businesses since inside sales and support agents often forget training within weeks. The market is already stormed with a number of companies doing (non-real time) post-call analytics, and very recently, we have started seeing a few promising real-time conversational AI companies making a mark.
ii) Horizontal vs Vertical: Both horizontal and vertical domain-specific companies have huge potential
Some applications will find use cases across multiple domains while others can be used only in a specific domain. This is because the vocabulary and the workflow for these domains are so different that it cannot be generalized, and furthermore, the market depth of those vertical domains is big enough to create a large opportunity. While most of the companies we came across were of horizontal use cases primarily in sales enablement and customer support, there were a few vertical domain-specific companies where conversations are very unique and critical. We believe that healthcare, education, and recruitment can be a good starting point for vertical focused conversational AI companies.
Most consumer applications will build a voice layer to get accessibility to vernacular user-base
Unlike India, the diverse Bharat is 500M people with smartphones, speaking 22+ regional languages with more than a thousand dialects. Most of them cannot type using vernacular keyboards (which are a nightmare to use) and some of them cannot even write. How does one sell new-age services/products primarily built for English speaking consumers to this Bharat? The answer is Speech. Whether it is quality education, quality healthcare, E-commerce, skilling, recruitment or financial services, a vernacular speech layer will be built over existing widely used applications to increase accessibility to non-English speaking user base. This, however, will be more like a feature and the potential to build a large independent company will be limited. We believe that the large platforms will provide APIs enabling existing applications to add this capability. Some of the larger players might even acquire smaller companies to accomplish this e.g. Flipkart’s acquisition of Liv.ai.
Limited number of consumer applications where speech recognition is critical to consumer needs
There are however a few areas where speech is a core part of the consumer needs and not just a layer built on top. Search and command are the most widely used consumer use-cases, but it is hard to build a large company competing with Google, Alexa and Siri with minimal differentiation. Few other areas are language learning, improving English communication and fluency for a specific profession (relevant for blue-collar workers or a call center agents), checking the psychological health of a patient using his/her voice. However, the number of companies in B2C with such use cases turns out to be more limited than we had originally envisaged, both in India as well as the US. There can be many more consumer applications where speech can play a critical role, for now, we leave it to the Entrepreneur’s creative imagination. The rise in speech-based consumer startups is yet to be seen globally.
- The technology play is fairly commoditized in this space and it will be difficult to differentiate and build a large independent company as a platform where developers can build apps on it
- There is an opportunity to create B2B Saas applications targeting global businesses where conversations are a critical part of customer experience.
- A speech recognition layer will be built on top of existing highly used B2C applications. But the number of use cases where voice is core to solving a consumer’s problem is fairly limited.
If you are working in the speech recognition space, please feel free to write to us at firstname.lastname@example.org and we would be happy to chat.