Sponsored By

Google Boosts Cloud Speech APIGoogle Boosts Cloud Speech API

New features and additional language support extends capabilities and reach of company's speech conversation technology.

Michelle Burbick

August 14, 2017

3 Min Read
No Jitter logo in a gray background | No Jitter

Google has been busy upgrading its Cloud Speech API to better meet a growing imperative for converting speech to text at cloud scale for improved user experiences.

Google previously used the speech-to-text technology exclusively in Google Now and other of its applications, as well as for search. But last spring the company opened up its speech technology to the masses, beginning with alpha testing of the Cloud Speech API in April 2016 and moving to GA this past April.

"Through the cloud, we're democratizing speech for businesses, from small startups to large enterprises," said Dan Aharon, Google Cloud product manager.

Thousands of businesses already are using the Cloud Speech API, and the number is growing every month, Aharon said. Based on customer feedback Google has been collecting over the past year through various channels, today it announced a few enhancements meant to address common demand.

The enhancements come in three parts. First, Google has added timestamp information for each word in a transcript -- the No. 1 most requested feature, Aharon said. Timestamps on a word-by-word level allow users to jump to the moment in the audio where the text was spoken, or display the relevant text while the audio is playing, he explained.

A couple of businesses already making use of this functionality are Happy Scribe, which uses the API to provide a voice-to-text transcription service, and Voximplant, which uses the API to help other companies build voice and video applications, Google announced. In prepared statements, both companies have commented that this feature allows them to save considerable time in working with speech-to-text transcriptions and pass along that time saved to their business customers.

In addition, Google has extended the length of the audio file supported. Cloud Speech API previously worked with audio files than ran no longer than 80 minutes, but now supports files of up to three hours in length. Further, customers who have even longer files can apply for a quota extension through Cloud Support, for approval on a case-by-case basis.

Lastly, Google has added support for 30 additional languages, including Bengali, Latvian, and Swahili, bringing the total to 119 and representing a billion additional potential users, Aharon said.

A motivating factor for the language extension is to aid in the economic development of more countries, Aharon said. By exposing an additional billion people to the product, Google is hopeful it can help existing customers reach new audiences and expose people in those countries to technology previously unavailable to them, he added.

In a demo of the technology Aharon gave, Cloud Speech API performed in real time to 100% accuracy.

Going forward, Google will have additional speech technology-related news, including market innovations, later this year, Aharon hinted.

Overall, Google is working on "strengthening our cloud portfolio with an eye on bringing AI to customers," Aharon said. He explained that there are two main spaces Google sees for the use of AI through speech recognition: human-computer interaction and speech analytics on human-to-human interactions.

Regarding AI, Google sees a wide range of enterprise readiness, which is often dependent on how regulated an industry is. Highly regulated businesses like those in healthcare and financial services tend to be more conservative or laggards, while startups and those in less regulated spaces are more likely to be first to market with AI-enabled solutions. In particular, Google sees high demand for AI and speech technology in the contact center, being leveraged to improve the customer experience.

Follow Michelle Burbick and No Jitter on Twitter!
@nojitter
@MBurbick

About the Author

Michelle Burbick

Michelle Burbick is the Special Content Editor and a blogger for No Jitter, Informa Tech's online community for news and analysis of the enterprise convergence/unified communications industry, and the editorial arm of the Enterprise Connect event, for which she serves as the Program Coordinator. In this dual role, Michelle is responsible for curating content and managing the No Jitter website, and managing its variety of sponsored programs from whitepapers to research reports. On the Enterprise Connect side, she plans the conference program content and runs special content programs for the event.

Michelle also moderates Enterprise Connect sessions and virtual webinars which cover a broad range of technology topics. In her tenure on the No Jitter and Enterprise Connect teams, she has managed the webinar program, coordinated and ran the Best of Enterprise Connect awards program, and taken on special projects related to advancing women in the technology industry and promoting diversity and inclusion. 

Prior to coming to No Jitter, Michelle worked as a writer and editor, producing content for technology companies for several years. In an agency environment, she worked with companies in the unified communications, data storage and IT security industries, and has developed content for some of the most prominent companies in the technology sector.

Michelle has also worked in the events and tradeshows industry, primarily as a journalist for the Trade Show Exhibitors Association. She earned her Bachelor's degree from the University of Illinois at Chicago. She is an animal lover and likes to spend her free time bird watching, hiking, and cycling.