Speech Tech for Enterprise... With a Caveat or TwoSpeech Tech for Enterprise... With a Caveat or Two
Many signs point to speech tech heading into the enterprise, but is it really ready?
February 14, 2018
With some technologies, sometimes it feels like you snap your fingers and they're everywhere. This is how it's been with consumer-focused speech technologies -- suddenly, all my friends seem to have Amazon Alexa or Google Home voice-activated assistants. My younger sister, a proud new owner of an Apple Watch 3, now regularly confuses me into thinking she's talking to me when she's really speaking into her wrist, dictating a text for Siri to send on her behalf. Even I've begun to shift my behavior away from text searches and toward voice queries through Google Assistant.
But as we know all too well, the enterprise market is a different beast than the consumer world. Are speech technologies ready to be put to work, at work? That's the question we're aiming to answer next month at Enterprise Connect Orlando 2018 in our new Speech Technologies track, and elsewhere throughout the program.
How Enterprises Think About Speech
Activity around speech technology is rampant, but when it comes to the enterprise, speech tech is still fairly immature. As we've previously seen with the consumerization of IT in other technology areas, enterprises today are interested in learning how they can apply speech technologies to deliver real business value, whether that be through the application of enterprise-focused voice assistants like Alexa for Business and Cisco Spark Assistant, or through the application of natural language processing (NLP) to turn voice interactions into structured data for analysis.
A sweet spot for speech technologies within the enterprise today is in facilitating customer transactions, such as making a payment or checking into a flight, said Robert Harris, president of UC consulting firm Communications Advantage and moderator of the upcoming Enterprise Connect session, "Are Speech Technologies Ready for the Enterprise?" For such use cases, speech technology may actually provide better service than a human, he added.
"As much as people say a live person is always better, there are some businesses -- like banking -- where you [often] really just want to talk to a machine and get it done."
Outside customer-facing operations, applications of speech technology for business users tend to focus around productivity, collaboration, and workflows to improve the meeting experience, Jon Arnold, principal at J Arnold & Associates, told me in a recent phone briefing. More and more, they're exploring the cool things they're able to do with voice now, and are starting to realize they can use voice in ways they couldn't before, he added. Arnold will be presenting an Enterprise Connect tech tutorial on speech technologies on Monday, March 12, at 8:00 a.m. ET. Eventually, we'll see less use of PCs, no use of phones, and more use of Alexa on desks, he added. (Incidentally, I've seen his presentation, and if your enterprise is trying to get its arms around speech tech, trust me, this is one session you won't want to miss.)
But, he noted, the speech technology space is particularly messy, comprising everything from voice recognition to NLP, speech-to-text transcription, text-to-speech, and voice analytics. And within each of these subcategories, a variety of providers is vying for mindshare, promising unique differentiators and applications. That makes sorting out what's what a bit of a challenge.
Alexa for Business and Spark Assistant
Among the well-known enterprise contenders vying for attention are Amazon, for Alexa for Business; and Cisco, for Spark Assistant.
Amazon Echo Dot
In November 2017, Amazon Web Services (AWS) announced it was taking its consumer-oriented Alexa virtual assistant to business school, revealing Alexa for Business, a fully managed service aimed at allowing companies to deploy Echo devices at scale throughout the workplace. As reported at the time, AWS is targeting improved meeting experiences, backed up by integrations with the likes of Cisco, Crestron, and Polycom for their in-room conferencing systems, RingCentral for its cloud-based meetings, Microsoft for Office 365 and on-premises Exchange servers, and Google with G Suite.
The power of voice isn't to be ignored, as Amazon CTO Werner Vogels said in introducing Alexa for Business. "It's the natural way of interacting with your systems, ... [and] it's the first disruption by the deep learning capabilities of the tools we're giving you." The next-generation of systems will be built using conversational interfaces, he asserted.
Amazon and its partner ecosystem certainly aren't alone in this belief, and despite this being early days for speech enablement of business processes "when AWS puts its voice to a cause, we've got to listen up," as No Jitter editor Beth Schultz wrote at the time.
Cisco announced its AI-powered, meetings-oriented Spark Assistant around the same timeframe as AWS made its enterprise move, relying on technology from MindMeld, a May 2017 acquisition. The goal is to make Spark Assistant, which will move into Spark Room endpoint field trials next month, really good at helping users set up meetings with coworkers, invite people to meetings, share documents, and such, Timothy Tuttle, CTO of Cisco's Cognitive Collaboration Group, and former CEO and founder of MindMeld, told me in a recent interview.
Tim Tuttle demoing Spark Assistant at Cisco event for press/analysts
These are skills, of course, that consumer-oriented voice assistants like Siri and Google Assistant "have no idea about," Tuttle added. An enterprise voice assistant needs to know about "things like who are your coworkers, what meetings you've had recently, what are all the meeting rooms and shared spaces that might be available in the future for a meeting. Maybe the assistant even has access to a company file system or Box folder, and based on that it will have the ability to find documents that may be helpful."
Long term – 10 years out or so -- Tuttle said he envisions Spark Assistant becoming even more helpful, not just setting up meetings but taking notes, circulating materials, and making sure all collaboration equipment is working properly.
An Evolving Market: Some Players
Harris cautioned, however, that enterprise expectations might be ill-aligned with reality. "I think the more interested and optimistic a customer is, the more disappointed it often ends up being -- because it's really difficult stuff. There's no silver bullet that's going to make it all come together."
For example, ambiguous words still trip up natural language processors, and most voice-based assistants still can't handle compound commands (Alexa, start my meeting and open my presentation deck.) "No one has demonstrated a product that you can have a conversation with about anything," Harris said.
Speech tech holds its true value for tactical and logistical tasks, he added. "When you try to create more of a human element with speech tech, that's when it leads to frustration."
A slew of other companies are carving out niches in enterprise speech with an eye on improving the overall experience. In speech recognition, for example, Fluent.AI offers what it calls "acoustic speech recognition." By this, the company means it bypasses speech-to-text transcription and "goes directly from speech to intent," company CEO Niraj Bhargava told me. This differs from other speech recognition services, which require speech to be converted to text first before the AI system can learn intents and understand what's being said, he said.
Voice analytics players like VoiceBase and TalkIQ, which do things like real-time transcription and sentiment analysis, also aim to deliver insight from voice – often in conjunction with other products.
Indeed, most enterprise speech offerings take the form of add-ons to existing solutions or are platforms purpose-built for other applications, Arnold pointed out. AISense, for example, provides the speech-to-text transcription capability that cloud video provider Zoom has incorporated in its platform for meeting recordings. And Gong.io uses VoiceBase transcription technology to deliver conversational intelligence to sales teams.
One beneficiary of Gong.io's speech integration is Allbound, which provides software aimed at helping businesses build successful partner and referral programs. Allbound's sales team uses Gong.io to improve sales conversions and guide sales members with intelligence on conversations they are having with prospects, Greg Reffner, VP of sales at Allbound, told me in a briefing last fall.
With the introduction of Gong.io's speech technology, Allbound has brought a consistency, process, and transparency to its sales operations, leading to it decrease its sales cycle by 60 days, Reffner said. Not only that, but Allbound has quadrupled its sales rate. "These changes happened when we started having visibility, and we didn't get that visibility until Gong," he said.
Along these same lines, Genesys has integrated its PureCloud contact center solution with Amazon Lex, a service for building conversational interfaces. Genesys is using Lex to create a more conversational and intelligent IVR so that customers can speak more naturally when navigating support options. And real-time cloud communications platform Voximplant leverages the Google Cloud Speech API to help other businesses build voice and video applications.
"Before the Speech API, and speech to text, you could only get digit responses from callers [press 1 for X, press 2 for Y...]," Voximplant CEO Alexey Aylarov told me in a briefing last fall. "Now people can talk to a bot going through a scenario, or script -- voice bots are really valuable. It's a new experience, so some people don't even understand they are talking to a robot."
But it's still early days for Voximplant and its use of speech tech. "We started building an analytics platform around it as well, but right now it's not a big part of our business. But I have a strong feeling that this [speech tech] part will be growing much faster than other parts," Aylarov said. "It's not a free service, so we need to be able to sell this to our customers; eventually, I believe usage will be growing exponentially."
Indeed, the skepticism that abounded five years ago about speech technology in the enterprise has dissipated, MindMeld's Tuttle said, recounting how at that time "accuracy was very hit or miss, and people didn't feel comfortable or were creeped out by having a machine listening."
With people using speech technology on a regular basis in their personal lives, "we are rapidly moving towards a world where most users will expect to be able to talk to a device if they prefer that mode of interaction," he added. "Those same behaviors will come into the workplace."
Come explore speech tech with us at Enterprise Connect 2018, taking place March 12-15 in Orlando, Fla. In addition to Robert Harris's session, "Are Speech Technologies Ready for the Enterprise?," and Jon Arnold's session, "Tech Tutorial: Speech Technologies for the Enterprise," for the we have a whole track on Speech Technologies for you to explore, featuring a deep-dive tutorial, enterprise end user panel, and our annual Innovation Showcase. Register now using the code NOJITTER to save an additional $200 off the Early Bird Pricing or get a free Expo Plus pass.
Follow Michelle Burbick and No Jitter on Twitter!
@nojitter
@MBurbick