Michelle Burbick
Michelle Burbick is the Associate Editor and a blogger for No Jitter, UBM Tech's online community for news and analysis...
Read Full Bio >>

Michelle Burbick | March 27, 2018 |


Google Debuts Text-to-Speech for App Integration

Google Debuts Text-to-Speech for App Integration Cloud Text-to-Speech offering targets IVR, IoT, and spoken text applications.

Cloud Text-to-Speech offering targets IVR, IoT, and spoken text applications.

Google has had text-to-speech synthesis technology in its own products for years -- think Google Assistant, Search, Maps, and Home -- as Dan Aharon, product manager for Cloud AI at Google, told me in a No Jitter briefing. And Google Cloud customers have been asking for access to the technology for a long time now, so that developers can add text-to-speech capabilities to their own applications, he added. But the company has taken its time delivering because "we wanted to make sure that the voices we produce for Cloud are different than the voices we produce for our Google products," -- to eliminate any confusion from consumers about what is and isn't a Google product, he said.

Well, the wait is over. Today, via a Google Blog post, Aharon announced that the company is bringing its text-to-speech synthesis technology to the Google Cloud Platform with Cloud Text-to-Speech.

Cloud Text-to-Speech lets developers choose from 32 different voices in 12 language variants, and it supports a variety of audio formats including mp3 and wav. Developers can also customize pitch, speaking rate, and volume gain.

Cloud Text-to-Speech includes a number of high-fidelity voices that were built using WaveNet, which is a neural network for raw audio that was created by DeepMind, a Google subsidiary focused on long-term research in machine learning and artificial intelligence (AI). WaveNet came out of a research paper published roughly a year and a half ago, Aharon told me.

"In late 2016, DeepMind introduced the first version of WaveNet -- a neural network trained with a large volume of speech samples that is able to create raw audio waveforms from scratch," Aharon wrote in the Google Blog post. "During training, the network extracts the underlying structure of the speech, for example which tones follow one another and what shape a realistic speech waveform should have. When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches."

In the year and a half since, the Google Speech team has been investing heavily and working closely with DeepMind to productize the WaveNet model. This resulted in improvements that allow the model to generate raw waveforms 1,000 times faster than the original model, as well as create waveforms with 24,000 samples a second. Additionally, Google has increased the resolution of each sample from 8 bits to 16 bits, which results in higher quality audio and a more human-like sound, Aharon said.

The company is touting the new WaveNet model as producing the most human-like, natural-sounding speech available today. As shown in the Google graphic below, testing groups gave the U.S. English WaveNet voices an average mean opinion score of 4.1 (scale of 1-5), which is more than 20% better than the MOS given for standard (non-WaveNet) voices -- "closing the gap to human speech by over 70%," Aharon said. And due to the WaveNet model requiring less recorded audio input, Google expects to continue to improve the quality and variety of voices it makes available to Cloud customers over the next several months.


Mean Opinion Scores -- Graphic from Google

Google shared Cloud Text-to-Speech with alpha customers privately under NDA. "One of the things they like about the product is it's really good at pronunciation -- names, dates, times, etc.," Aharon said. "Other [text-to-speech] systems require customers to go reformat text to make it pronounce properly."

Two customers who are already using the service include Cisco and Dolphin ONE, the blog states.

"As the leading provider of collaboration solutions, Cisco has a long history of bringing the latest technology advances into the enterprise," said Tim Tuttle, CTO of Cognitive Collaboration at Cisco, in a prepared statement. "Google's Cloud Text-to-Speech has enabled us to achieve the natural sound quality that our customers desire."

To start, Google is targeting three main use cases with Cloud Text-to-Speech: intelligent IVRs in call centers, speech-enabling IoT devices, and converting text-based media into a spoken format. For call centers, enterprises can leverage Cloud Text-to-Speech to reduce or eliminate their reliance on pre-recorded human audio samples. In a customer service context, if a customer calls in for more information on toasters, for example, the IVR system using text-to-speech will be able to respond back in natural language, Aharon said. "Imagine it's replacing an IVR."

With IoT devices, the use case is very similar to IVR, Aharon said. Users want to be able to talk to it and ask it to lower the volume, for example, and have it respond back in a natural voice. For the third use case, look at your favorite news site and imagine that you could click a button and have your news articles read to you in an audio format, Aharon explained.

Cloud Text-to-Speech is priced per 1 million characters of text processed. For standard (non-WaveNet) voices, the first 4 million characters are free and then it's $4 per 1 million characters, and for WaveNet voices, the first 1 million characters are free after which it's $16 per 1 million characters. To put this into perspective, each million characters is equivalent to roughly 23 to 24 hours of audio, so 4 million characters would be around 90-100 hours of speech, Aharon said.

For those who may have missed it, at Enterprise Connect 2018 earlier this month, Diane Chaleff, a Google Cloud Office of the CTO executive, gave an Industry Vision Address about how machine learning technologies will become core to the communications tools of the future. Watch her talk below to get caught up:

Follow Michelle Burbick and No Jitter on Twitter!


September 26, 2018

Join Kevin Kieller, Microsoft UC&C expert, along with Ribbon Communications and Polycom, for an update on Microsoft Ignite, and a focus on critical things you need to know about your voice deployme

August 29, 2018

Moving your voice services to the cloud introduces new challenges for 9-1-1 services. These include the need to serve multiple locations, and the increased mobility that comes with having a phone t

August 8, 2018

Artificial intelligence (AI) is becoming a reality for your contact center. But to turn the promise of AI into practical reality, there are a couple of prerequisites: Moving to the cloud and integr

March 12, 2018
An effective E-911 implementation doesn't just happen; it takes a solid strategy. Tune in for tips from IT expert Irwin Lazar, of Nemertes Research.
March 9, 2018
IT consultant Steve Leaden lays out the whys and how-tos of getting the green light for your convergence strategy.
March 7, 2018
In advance of his speech tech tutorial at EC18, communications analyst Jon Arnold explores what voice means in a post-PBX world.
February 28, 2018
Voice engagement isn't about a simple phone call any longer, but rather a conversational experience that crosses from one channel to the next, as Daniel Hong, a VP and research director with Forrester....
February 16, 2018
What trends and technologies should you be up on for your contact center? Sheila McGee-Smith, Contact Center & Customer Experience track chair for Enterprise Connect 2018, gives us the lowdown.
February 9, 2018
Melanie Turek, VP of connected work research at Frost & Sullivan, walks us through key components -- and sticking points -- of customer-oriented digital transformation projects.
February 2, 2018
UC consultant Marty Parker has crunched lots of numbers evaluating UC options; tune in for what he's learned and tips for your own analysis.
January 26, 2018
Don't miss out on the fun! Organizer Alan Quayle shares details of his pre-Enterprise Connect hackathon, TADHack-mini '18, showcasing programmable communications.
December 20, 2017
Kevin Kieller, partner with enableUC, provides advice on how to move forward with your Skype for Business and Teams deployments.
December 20, 2017
Zeus Kerravala, principal analyst with ZK Research, shares his perspective on artificial intelligence and the future of team collaboration.
December 20, 2017
Delanda Coleman, Microsoft senior marketing manager, explains the Teams vision and shares use case examples.
November 30, 2017
With a ruling on the FCC's proposed order to dismantle the Open Internet Order expected this month, communications technology attorney Martha Buyer walks us through what's at stake.
October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.