Sponsored By

State of the Market Update: Speech and Voice RecognitionState of the Market Update: Speech and Voice Recognition

UC analyst shares how AI-driven speech tech has brought new forms of value to the workplace in this Enterprise Connect 2022 session preview.

Jon Arnold

March 1, 2022

6 Min Read
State of the Market Update: Speech and Voice Recognition
Image: Sergey Oplanchuk - Alamy Stock Vector

Enterprise Connect 2022 is a few short weeks away. Barring any last-minute pandemic flareups, we can finally shift from virtual to in-person. Aside from getting to see colleagues and clients again, there’s work to be done. I’ll be doing my fifth installment of the state of the Speech and Voice Recognition market update for speech technology and artificial intelligence (AI), with a particular focus on the enterprise. AI is everywhere these days, and when applied to speech tech, some interesting things start to happen.

 

While most of the attention has been on customer experience, AI-driven speech tech has brought new forms of value to the workplace, especially for collaboration. That’s the ground I’ll be covering at Enterprise Connect, and here’s a preview of my session, Where Speech Tech Is Today—and Where It’s Heading. I hope you’ll join me to see the full presentation. As a heads-up, I’m in the kickoff slot Monday, March 21, at 8 a.m.

 

While you’re there, I encourage you to check out other sessions from my fellow BCStrategies colleagues; Blair Pleasant, Michael Finneran, Thomas Brannen, Kevin Kieller, Dave Michels, and others.

 

State of the Market Update

Analysts love to talk about this, and although AI is moving a mile a minute, there isn’t a lot that’s new for enterprise speech tech. That doesn’t mean you shouldn’t come see my session, as interesting things are happening—they’re just not as sexy or dynamic as what the contact center space is going through. More importantly, I’ll continue a recent theme of my talks, namely how speech tech is just part of bigger story for how AI is transforming everything about work, including collaboration.

 

The first state of the market message is that we’re simply seeing more of the same, but better. Most of the innovation around enterprise speech tech came during the last two years, which I covered extensively in my previous talks—real time transcription, translation, captioning, summary notes, voice biometrics, noise suppression, etc.

 

Not much to see here, as these have largely become mainstream UCaaS application. More importantly, though, is how these applications keep improving, and that’s what makes this more about AI than speech tech. AI technologies are iterative by nature, meaning that the more we use them—and the larger the data sets become—the more accurate the applications become at mimicking human behavior.

 

Here’s what the “learning” in machine learning, and the “intelligence” in AI is all about, and for enterprise use cases, the performance has become good enough for everyday use. The best indicator of that is merger and acquisition activity, where speech tech start-ups are constantly being acquired. I’ll have an update on that during my talk.

 

At the risk of boring you to tears, there is one big change from last year in this space—and it’s where we all must pay attention. CAI is the acronym du jour for speech tech—conversational AI. If you don’t believe me, consider this: Gartner now has a CAI Magic Quadrant, so it must be true, right? CAI represents a major evolution for chatbots. It also takes speech tech as a whole to a whole new level.

 

As AI keeps improving the capabilities of speech tech, we now have a two-way dialog with bots, greatly increasing their utility. Rather than engaging with bots to issue commands or respond to closed-ended prompts, the dialog becomes conversational, with bots trying to emulate human speech, even trying to inject empathy and emotion. These bots aren’t trying to dupe us into thinking we’re talking to another human. However, the thinking is that the more human-like the conversation, the more likely we’ll express our true feelings. Not only does this yield better outcomes, but as trust builds when using CAI, so does the potential for AI to automate more workflows, tasks, interactions, etc.

 

In the contact center, we call these chatbots or virtual agents. In the workplace, we call them digital assistants. In either case, the potential for task automation and personal productivity becomes much greater, allowing us to finally move on from the bad rap that first-generation chatbots carry with them.

 

Where Speech Tech is Heading: All Bets Are On

If you like the current state of enterprise speech tech, you might love where it’s heading, although that probably depends on which side of the digital immigrant/native spectrum you fall. In short, the future belongs to gamers, and that’s a pretty good clue as to what’s coming.

 

A core theme of this year’s talk is that speech tech is just one of many AI applications, and at this point, we have solved the big problems around speech recognition. The story was similar to the first generation of unified communications, where the big challenge was getting all the disparate communications applications to interwork on a common platform. Once solved—thanks primarily to the cloud—anybody could offer UCaaS, and now you can embed real-time voice in just about anything. Nobody talks about the challenges of supporting telephony from the cloud anymore.

 

The same holds for enterprise speech tech, where voice is becoming more of a means than an end. Just as voice over Internet Protocol (VoIP) made telephony another data application, the broader digital transformation trend makes voice more valuable as a source of metadata around conversations— and to interact with “machines”—than as a medium for person-to-person communications.

 

In this context, enterprise speech tech will become part of a bigger, AI-led transition from in-person work to virtual and immersive experiences. I’ll be citing how Webex Hologram, Mesh for Teams, and yes, the “metaverse” are leading examples of this brave new world. While these may be highly visual forms of collaboration, speech tech will absolutely be central to the experience. These new models may or may not succeed. But, the major players are betting heavily on it, and where they go, speech tech will follow.

 

Picture1_1.png

 

As a music enthusiast, I couldn't end my article without mentioning how music intertwines with speech technologies. But what on earth could the Rolling Stones have to do with enterprise speech tech? You’ll have to attend my session for details, but I’ll give you a clue. Recently, the iconic band partnered with Boston Dynamics robotics design company to commemorate 40 years of its album Tattoo You (see image above).

 

As I always do, my update will end with some questions about how AI can head in the right and wrong directions, and we need to consider both when making technology decisions. As Mick Jagger once sang, “what’s puzzling you is the nature of my game.” That’s a pretty good reflection of the cautions I’ll be stressing as key takeaways. See you at Enterprise Connect!

BCS_logo_100px_0.jpgThis post is written on behalf of BCStrategies, an industry resource for enterprises, vendors, system integrators, and anyone interested in the growing business communications arena. A supplier of objective information on business communications, BCStrategies is supported by an alliance of leading communication industry advisors, analysts, and consultants who have worked in the various segments of the dynamic business communications market.

About the Author

Jon Arnold

Jon Arnold is Principal of J Arnold & Associates, an independent analyst providing thought leadership and go-to-market counsel with a focus on the business-level impact of digital transformation in the workplace. Core areas of expertise include unified communications, cloud services, collaboration, Internet of Things, future of work, contact centers, customer experience, video, VoIP, and social media.

 

He has been consulting in many of these areas since 2001, and his independent practice was founded in 2005. JAA is based in Toronto, Ontario, and serves clients across North America as well as in Europe.

 

Jon’s thought leadership can be followed on his widely-read JAA’s Analyst Blog, his monthly Communications and Collaboration Review, and ongoing commentary on Twitter and LinkedIn. His thought leadership is also regularly published across the communications industry, including here on No Jitter as well as on BCStrategies, Ziff Davis B2B/Toolbox.com, TechTarget and Internet Telephony Magazine.

 

In 2019, Jon was named a “Top 30 Contact Center Influencer,” and in 2018, Jon was included in a listing of “Top 10 Telecoms Influencers,” and “TOP VoIP Bloggers to Follow.” Previously, in both March 2017 and January 2016, Jon was cited among the Top Analysts Covering the Contact Center Industry. Also in 2017, Jon was cited as a Top 10 Telecom Expert, and Six Business Communications Thought Leaders to Follow. Before that, GetVoIP.com named Jon a Top 50 UC Experts to Follow in 2015, as well as a Top 100 Tech Podcaster in 2014. For JAA’s blog, it was recognized as a Top Tech Blog in 2016 and 2015, and has had other similar accolades going back to 2008.

 

Additionally, Jon is a UC Expert with BCStrategies, a long-serving Council Member with the Gerson Lehrman Group, speaks regularly at industry events, and accepts public speaking invitations. He is frequently cited in both the trade press and mainstream business press, serves as an Advisor to emerging technology/telecom companies, and is a member of the U.S.-based SCTC.