Sponsored By

How to Master the Artful Science of Speech QueriesHow to Master the Artful Science of Speech Queries

Done correctly, speech recognition can help reduce call time, increase customer satisfaction, and automate manual processes.

Andrew Prokop

May 9, 2016

5 Min Read
No Jitter logo in a gray background | No Jitter

Done correctly, speech recognition can help reduce call time, increase customer satisfaction, and automate manual processes.

  • With the new day comes new strength and new thoughts. -- Eleanor Roosevelt

It has been said that everything old is new again, and that has certainly been true for me. I began my career as a software designer, walked away from while-loops and if-statements about 12 years ago, and recently (and quite unexpectedly), found myself knee-deep back in the world of Java programming and webpage design. Despite today's technology being significantly different from what it was when I boxed up my programming books, logic is still logic and the methodologies programmers use to solve today's problems are the same as when I wrote assembler language for Intel 8085 processors.

In my recent No Jitter article, "My Life as an Avaya Breeze Boot Camp Recruit," I wrote about writing applications that lived in the middle of incoming and outgoing calls. Using many Breeze APIs, I captured calls, played prompts, collected DTMF digits, and routed those calls to the appropriate destinations.

As fun as all that was, the excitement level went up several notches when I learned how to listen programmatically to those calls with the Avaya speech engine. Now, while two people are talking, I am able to detect when particular words and phrases are spoken and take immediate action upon them. For instance, I wrote a sample bank application that automatically sends a text message with the caller's account information whenever it hears the words "account balance" or "text message."

I also use the speech engine to score agent performance. For example, did the agent properly welcome the caller within the first three seconds of the call? Information such as this is useful for training purposes and in efforts to drive up customer satisfaction scores.

While adding speech recognition to my application was easy, creating effective speech queries is an art. False positives are not only cumbersome for an application to deal with, but also lead to incorrect application behavior and inaccurate call statistics. So, to make sure I followed best practices for speech, I did a little research and came up with a number of guidelines.

Please note: While the following information is essential to programming effective speech searches, it's also applicable outside the world of software development. Business unit leaders developing requirements for their contact centers need to understand these best practices before a single line of code is ever written.

Search Query Guidelines

  • In most cases, parsing search terms involves two potential stages. In the first stage, the speech engine looks up words in a pronunciation dictionary. This dictionary may contain multiple pronunciations for each word. This phase may include two sub phases -- case-sensitive lookups and case-insensitive lookups. If the speech engine doesn't find the word in the pronunciation dictionary, then it applies a series of letter-to-sound (LTS) rules. LTS rules only produce a single pronunciation.

  • Capitalization matters. The chances are slim that you are reading this article on a train to Reading (pronounced "Redding").

  • Using longer words and phrases in search queries is always preferable to using shorter words. "Cat" will match people referring to their pets as well as words such as "category" and "catastrophic." This leads to a large number of false positives. It would be much better to use "cat" in a phrase such as "my cat is sick" or "I am calling about my cat."

  • Be careful with abbreviations. Searching for the words "self-contained underwater breathing apparatus" would be odd when you know nearly everyone in the English-speaking world will say "scuba." On the other side of the coin, you are more apt to hear the letters "I-B-M" and not "International Business Machines." If the abbreviation is typically spoken as a word, search for that word. If the abbreviation is typically spoken as individual letters, search for those letters. For example, a search for "IBM" would be created as "I B M" or "I.B.M."

  • Some abbreviations swing both ways. VAT might be said as "V A T" or "value-added tax." In these cases, add both to your search queries.

  • Different people say the same thing in different ways. For example, "two weeks," "14 days," and "a fortnight" mean the same thing and you should include every potential variation in your search criteria.

  • Spelling matters and incorrectly spelled words will lead to erroneous matches.

  • Confidence values can widen or narrow the search process. For instance, a confidence percentage of 90% will require a very precise match while 30% allows for loose pronunciations.

  • Timing is very important. Some phrases are probably best searched for at the beginning of a conversation (e.g., " Good morning") and others at the end (e.g., " Have a nice day").

  • In some cases, the exact phrase is important, but sometimes you only need to know when a few of the words in a phrase are heard. Most search engines will allow search qualifiers such as ALL, ANY, or AT LEAST to provide match flexibility.

  • Be careful about the duration of searches. Speech recognition consumes soft or hard media resources, which are typically scarce. Searching both the caller and the called parties individually consumes twice the resources of searching on a single channel.

Mischief Managed
As I delve deeper into the world of speech recognition, I am certain that I will expand my list of dos and don'ts, but this is a good start. In my limited experience, I've learned that speech recognition is both a science and an art. As such, enterprises may need to involve very different kinds of people as they derive and deploy search queries. Clever, technical people will handle the programming and scaling tasks, while business unit leaders and their specialists will determine the what, when, and how important aspects of each word and phrase.

When correctly implemented, speech recognition provides significant value to businesses and their customers. It can reduce call time, increase customer satisfaction, and automate manual processes. Done incorrectly, the only search query that matters just might be "garbage in, garbage out."

Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.

Follow Andrew Prokop on Twitter and LinkedIn!
@ajprokop
Andrew Prokop on LinkedIn

About the Author

Andrew Prokop

Andrew Prokop has been involved in the world of communications since the early 1980s. He holds six United States patents in SIP technologies and was on the teams that developed Nortel's carrier-grade SIP soft switch and SIP-based contact center.

 

Through customer engagements, users groups, podcasts, proof-of-concept software development, trade-shows, and webinars, Andrew has been an evangelist for digital transformation technologies for enterprises and their customers. Andrew understands the needs of the enterprise and has the background and skills necessary to assist companies as they drive towards a world of dynamic and immersive communications.

 

Andrew is an active blogger and his widely read blog, Tao, Zen, and Tomorrow (formerly SIP Adventures) discusses every imaginable topic in the world of unified communications. He is just as comfortable writing at the 50,000 foot level as he is discussing natural language processing or the subtle nuances of a particular SIP header.