Google Ramps Up Speech Capabilities for EnterprisesGoogle Ramps Up Speech Capabilities for Enterprises

Aims to make speech products more accessible to companies across the globe.

February 20, 2019

3 Min Read

Google today announced new features, support for more voices and languages, and lower prices for its Cloud Speech-to-Text and Text-to-Speech products.

Google has been steadily improving its speech technology offerings over the last several years. After opening up its Cloud Speech API to the masses in April 2017, the company upgraded the API with new features and language support in August 2017. Then in March 2018, it debuted Cloud Text-to-Speech -- technology that it had been using in its own products for years.

With the enhancements announced today, Google is aiming to make speech technologies more accessible for enterprises across the globe. To start, it has improved the accuracy of its Cloud Speech-to-Text capabilities. “Unfortunately, many companies build speech applications that need to run on phone lines and that produce noisy results, and that data has historically been hard for AI-based speech technologies to interpret,” Dan Aharon, product manager for Cloud AI at Google, wrote in a Google blog.

To address situations that produce “less than pristine” data, as Aharon wrote, Google has been working with beta customers’ usage data to refine the accuracy of its models. As a result of these beta tests, Google now offers an enhanced phone model that it claims has 62% fewer transcription errors and a video model with 64% fewer errors compared to its previously available technology.

Google is now allowing any enterprise to use the enhanced phone and video models without mandating that they also opt in to data logging, which had been a requirement during the beta testing. Customers that do opt in, however, will pay a lower rate.

Further, multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels, is now generally available. This functionality is useful for call and meeting analytics. This and other new features qualify for a service-level agreement and other enterprise-level guarantees, Aharon said.

Reducing Costs

Another way Google aims to make its speech technologies more accessible to enterprises is with lower pricing. As mentioned above, customers that opt in to Google’s data logging program will receive a discount – 33% less -- for use of all standard models and the premium video model, Aharon wrote. Further, Google has cut pricing for its premium video model by 25%, which when combined with the discounted rate for opting in to the data logging program, brings the total savings to 50%, he said.

Text-to-Speech: Expanding the Scope

On the speech synthesis front, Google announced that it has doubled the number of overall voices, WaveNet voices, and WaveNet languages for its Cloud Text-to-Speech offering. For those who may need a reminder, WaveNet is Google’s generative model for raw audio.

Google Text-to-Speech now includes support for seven additional languages or variants -- Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmal -- all of which are currently in beta. Once generally available, this will bring the number of supported languages to 21. For WaveNet, Google has unveiled 31 new WaveNet voices and 24 new standard voices across the newly supported languages, Aharon wrote.

Finally, Google has announced general availability of a Cloud Text-to-Speech Device Profiles feature, which optimizes audio playback on different types of hardware, Aharon said. “For example, some customers with call center applications optimize for interactive voice response (IVR), whereas others that focus on content and media (e.g. podcasts) optimize for headphones,” he explained.

Enterprises can view demos and give Google’s Cloud Speech products a try today. It’s offering the first 60 minutes of audio processing each month for Cloud Speech-to-Text for free, as well as a $300 credit to start testing.

Learn more about AI and Speech Technologies at Enterprise Connect 2019, coming to Orlando the week of March 18. Check out our AI & Speech Technologies track and all the sessions we are offering that week. If you haven’t yet signed up for your Enterprise Connect pass, register now to take advantage of our Early Bird Rate -- ending tomorrow! As a No Jitter read, you can save an extra $200 off your pass by using the code NJPOSTS at checkout!

About the Author

Michelle Burbick

Michelle Burbick is the Special Content Editor and a blogger for No Jitter, Informa Tech's online community for news and analysis of the enterprise convergence/unified communications industry, and the editorial arm of the Enterprise Connect event, for which she serves as the Program Coordinator. In this dual role, Michelle is responsible for curating content and managing the No Jitter website, and managing its variety of sponsored programs from whitepapers to research reports. On the Enterprise Connect side, she plans the conference program content and runs special content programs for the event.

Michelle also moderates Enterprise Connect sessions and virtual webinars which cover a broad range of technology topics. In her tenure on the No Jitter and Enterprise Connect teams, she has managed the webinar program, coordinated and ran the Best of Enterprise Connect awards program, and taken on special projects related to advancing women in the technology industry and promoting diversity and inclusion.

Prior to coming to No Jitter, Michelle worked as a writer and editor, producing content for technology companies for several years. In an agency environment, she worked with companies in the unified communications, data storage and IT security industries, and has developed content for some of the most prominent companies in the technology sector.

Michelle has also worked in the events and tradeshows industry, primarily as a journalist for the Trade Show Exhibitors Association. She earned her Bachelor's degree from the University of Illinois at Chicago. She is an animal lover and likes to spend her free time bird watching, hiking, and cycling.

See more from Michelle Burbick

Related Topics

Recent in AI & Automation

Related Topics

Recent in Infrastructure

Related Topics

Recent in Digital Workplace

Related Topics

Recent in Data Management

Related Topics

Recent in Contact Centers

Related Topics

Google Ramps Up Speech Capabilities for EnterprisesGoogle Ramps Up Speech Capabilities for Enterprises

About the Author

Editor's Spotlight

Related Topics

Recent in AI & Automation

Related Topics

Recent in Infrastructure

Related Topics

Recent in Digital Workplace

Related Topics

Recent in Data Management

Related Topics

Recent in Contact Centers

Related Topics

<span class="ArticleBase-LargeTitle">Google Ramps Up Speech Capabilities for Enterprises</span>Google Ramps Up Speech Capabilities for EnterprisesGoogle Ramps Up Speech Capabilities for Enterprises

About the Author

Editor's Spotlight

Google Ramps Up Speech Capabilities for EnterprisesGoogle Ramps Up Speech Capabilities for Enterprises