SHARE



ABOUT THE AUTHOR


Brent Kelly
Brent Kelly is president and principal analyst at KelCor, Inc., where he provides strategy and counsel to key client types...
Read Full Bio >>
SHARE



Brent Kelly | November 02, 2017 |

 
   

Where's the AI in Cisco Spark Assistant?

Where's the AI in Cisco Spark Assistant? Under the covers, there is a lot of AI in play to make Cisco Spark Assistant for meetings work.

Under the covers, there is a lot of AI in play to make Cisco Spark Assistant for meetings work.

As covered elsewhere on No Jitter, Cisco today introduced Cisco Spark Assistant, an AI-powered assistant aimed sharply at improving the efficiency of meetings. Rowan Trollope, SVP & GM of IoT and Applications at Cisco, and his team are on a relentless pursuit to make the meeting room experience so easy and convenient that the technology simply disappears, allowing participants to focus on the content of their meeting. Today's Spark Assistant announcement is another in a long line of innovations Cisco is bringing to the meeting experience, and to our industry overall.

Cisco is positioning Spark Assistant as an AI-powered voice assistant; let's decompose how Spark Assistant works so that we can identify where the artificial intelligence actually resides in this product.

At a high level, Spark Assistant can be diagrammed as a series of processes beginning with a user's voice command and ending with the assistant performing an action and informing the user of that action.

portable
Figure 1. A block diagram of how Cisco Spark Assistant works


In this five-step process, there are three areas where Cisco has invoked the use of artificial intelligence: speech-to-text, natural language processing, and text-to-speech. Because speech to text and natural language processing are CPU intensive, these functions are performed by processors up in the Spark cloud and not on the Spark Room endpoint.

Speech to Text
As a user is speaking, an analog to digital converter in the newly announced Spark Room 70 (see "Hey Spark, How Is Cisco Partner Summit?"), samples the soundwave at 44 kHz, and each measurement is assigned a number that reflects the amplitude of the soundwave at a given point in the utterance.

portable


This digital string of numbers is then sent to the Cisco Spark cloud, where additional filtering and normalization occurs. It is after this step that the artificial intelligence begins.

When speech is processed, it is broken up into phonemes, which are the elemental building blocks of sounds that comprise spoken language. In the English language, there are approximately 40 phonemes. These phonemes are then processed, with the algorithm looking at what came before and what came after a particular phoneme. The AI algorithm does this to try to put the phonemes into some sort of context so that it can determine what words, phrases, and sentences a person has spoken. It turns out that Cisco is using several third-party algorithms in Spark Assistant to convert the speech to text, and it will make a decision on which to use closer to when the product becomes generally available.

This speech-to-text algorithm must be trained so that it understands words that are specific to a particular domain. As we think about Cisco Spark Assistant, the domain is clearly one of meetings. Thus, the algorithm will be able to detect words like "Start my meeting," "Call Michael's meeting room," "Call Sidney," or "End my meeting."

The output from the speech-to-text functional block is a string of text that should be what the user has spoken. At this point in the process, the system has simply converted speech to text, but it has not determined the intent of what the user actually wants it to do. This happens in the natural language processing block.

Natural Language Processing
The goal of the natural language processing block is to determine the intent of the user, along with any entities or objects that this intent must act upon. This is another significant artificial intelligence processing step. This is the area where the MindMeld software acquired by Cisco earlier this year comes into play.

In its initial debut, Cisco Spark Assistant has four primary "intents," or actions, that it can invoke:

  1. Start a meeting
  2. Call into somebody else's meeting room
  3. Call another video endpoint or telephone
  4. End a call

Each of these intents necessarily involve other entities or objects that must be acted upon. For example, if a person says, "Start my meeting," the system can figure out that it's supposed to start a meeting, but it also has to figure out who "my" is. Consequently, there must be some type of identification of who is speaking to the system. Spark Assistant will try to identify who is in a room using Cisco's proximity mechanism, which involves the use of subsonic signaling between a person's mobile device and the video endpoint. If there are multiple people in the room, the system may need to ask which person is speaking so that it can properly identify who the word "my" is referring to. In future versions of the Spark Assistant, it will be able to determine who is speaking through facial recognition or through other means that may be added later, such as voice fingerprinting.

Other things that the system must figure out include which other endpoints to call. So, for example, if a person says, "Call Michael's phone", the system must have some integration with the speaker's contact base, probably through Active Directory, or another LDAP directory, so that it can query which "Michaels" are in the person's contact list and what their contact parameters are.

As you can see, there is a lot of background processing that must occur to determine what the user really wants the system to do. The good news is that this first version of Spark Assistant focuses on the meeting domain, so the system logic is constrained to identify intents and entities that may be involved in some type of video meeting. Additional domains may be added at a future time, which would further increase the utility of Spark Assistant.

Once the Assistant's artificial intelligence software has determined the intent of what the user wants to do, along with the entities involved to accomplish that intent, it then must interface with some type of mechanism that will invoke an action.

Action Performed
Spark Assistant's artificial intelligence processing is performed in the cloud, but once the intent and the entities are identified, this information is sent back to the Spark Room 70 video endpoint. The endpoint accepts this intent along with the entity involved, and will use it to launch a call to a specified person, start a particular user's meeting, or end a call. These are functions that a Spark Room system already does, but heretofore, they have had to be done in a more manual fashion. Of course, Spark Room systems integrate with Spark's call control mechanism to launch calls and to end calls. This particular step of the Spark Assistant process does not involve artificial intelligence, but rather it uses the AI information from previous steps to automate existing functions that have typically been done manually.

Text to Speech
Once Spark Assistant has identified what the user wants to do, and has invoked that action on the Spark Room 70 endpoint, it plays synthesized speech telling the user what action it is taking and, if necessary, who it is calling. This step of the process uses the intents that were found earlier along with the entities. The entities are important here because if the user said to call Michael, the text-to-speech output would need to synthesize the name Michael in its response. Although text-to-speech is still considered an artificial intelligence process, it is one of the easier AI processes because there is really no machine learning or deep learning required.

Conclusion
Cisco Spark Assistant is the latest in a series of enhancements Cisco is making to the meeting room experience. This particular assistant relies heavily on artificial intelligence processing in the Cisco Spark cloud. It also integrates with local processing capability on the endpoint to execute the command the user has spoken to the assistant.

We should expect to see a broadening of the capabilities found in Cisco Spark Assistant over time to cover additional intents that may be found within the overall meeting paradigm, such as scheduling, recording, summarizing, and tagging. These later capabilities are simply speculation on my part, but they would make sense given where Cisco is trying to go with enhancing ease-of-use and the overall quality of the meeting room experience.

We should also expect to see other examples of business-focused AI systems from Cisco and others becoming pervasive throughout enterprises, impacting the efficiency with which we work and perform. We won't really see Spark Assistant in a production mode until sometime next year. Cisco explained that making the Assistant smart enough to do a simple demo is one thing, but making it a robust, secure business tool that works every time is a much more difficult task. Hence, the delivery date sometime in 2018.

Related content:

At Enterprise Connect Orlando 2018, coming March 12 to 15, hear directly from Cisco on its vision and product direction in a keynote address. Jonathan Rosenberg, VP & CTO of Cisco's Collaboration Technology Group, will take the stage on Tuesday, March 13 at 10 AM. Register now using the code NOJITTER to save an additional $200 off the Advance Rate or get a free Expo Plus pass.





COMMENTS



Enterprise Connect Orlando 2018
March 12-15 | Orlando, FL

Connect with the Entire Enterprise Communications & Collaboration Ecosystem


Stay Up-to-Date: Hear industry visionaries in Keynotes and General Sessions delivering the latest insight on UC, mobility, collaboration and cloud

Grow Your Network: Connect with the largest gathering of enterprise IT and business leaders and influencers

Learn From Industry Leaders: Attend a full range of Conference Sessions, Free Programs and Special Events

Evaluate All Your Options: Engage with 190+ of the leading equipment, software and service providers

Have Fun! Mingle with sponsors, exhibitors, attendees, guest speakers and industry players during evening receptions

Register now with code NOJITTEREB to save $200 Off Advance Rates or get a FREE Expo Pass!

November 29, 2017

As video conferencing use rises in the enterprise, businesses are looking for ways to bring this technology out of traditional conference room and make it more broadly accessible. That's made the h

November 1, 2017

Your customers (internal and external) demand that you offer them the ability to connect by any means. With the adoption of cloud communications tools you now have access to an expanded portfolio o

October 18, 2017

Microsofts recent Ignite event had some critically important announcements for enterprise communications. Namely, Microsofts new Team Collaboration offering, Teams, will be its primary communicatio

October 23, 2017
Wondering which Office 365 collaboration tool to use when? Get quick pointers from CBT Nuggets instructor Simona Millham.
September 22, 2017
In this podcast, we explore the future of work with Robert Brown, AVP of the Cognizant Center for the Future of Work, who helps us answer the question, "What do we do when machines do everything?"
September 8, 2017
Greg Collins, a technology analyst and strategist with Exact Ventures, delivers a status report on 5G implementation plans and tells enterprises why they shouldn't wait to move ahead on potential use ....
August 25, 2017
Find out what business considerations are driving the SIP trunking market today, and learn a bit about how satisfied enterprises are with their providers. We talk with John Malone, president of The Ea....
August 16, 2017
World Vision U.S. is finding lots of goodness in RingCentral's cloud communications service, but as Randy Boyd, infrastructure architect at the global humanitarian nonprofit, tells us, he and his team....
August 11, 2017
Alicia Gee, director of unified communications at Sutter Physician Services, oversees the technical team supporting a 1,000-agent contact center running on Genesys PureConnect. She catches us up on th....
August 4, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, has lately been working on integrating enterprise communications into Internet of Things ecosystems. He shares examples and off....
July 27, 2017
Industry watcher Elka Popova, a Frost & Sullivan program director, shares her perspective on this acquisition, discussing Mitel's market positioning, why the move makes sense, and more.
July 14, 2017
Lantre Barr, founder and CEO of Blacc Spot Media, urges any enterprise that's been on the fence about integrating real-time communications into business workflows to jump off and get started. Tune and....
June 28, 2017
Communications expert Tsahi Levent-Levi, author of the popular BlogGeek.me blog, keeps a running tally and comprehensive overview of communications platform-as-a-service offerings in his "Choosing a W....
June 9, 2017
If you think telecom expense management applies to nothing more than business phone lines, think again. Hyoun Park, founder and principal investigator with technology advisory Amalgam Insights, tells ....
June 2, 2017
Enterprises strategizing on mobility today, including for internal collaboration, don't have the luxury of learning as they go. Tony Rizzo, enterprise mobility specialist with Blue Hill Research, expl....
May 24, 2017
Mark Winther, head of IDC's global telecom consulting practice, gives us his take on how CPaaS providers evolve beyond the basic building blocks and address maturing enterprise needs.
May 18, 2017
Diane Myers, senior research director at IHS Markit, walks us through her 2017 UC-as-a-service report... and shares what might be to come in 2018.
April 28, 2017
Change isn't easy, but it is necessary. Tune in for advice and perspective from Zeus Kerravala, co-author of a "Digital Transformation for Dummies" special edition.
April 20, 2017
Robin Gareiss, president of Nemertes Research, shares insight gleaned from the firm's 12th annual UCC Total Cost of Operations study.
March 23, 2017
Tim Banting, of Current Analysis, gives us a peek into what the next three years will bring in advance of his Enterprise Connect session exploring the question: Will there be a new model for enterpris....
March 15, 2017
Andrew Prokop, communications evangelist with Arrow Systems Integration, discusses the evolving role of the all-important session border controller.
March 9, 2017
Organizer Alan Quayle gives us the lowdown on programmable communications and all you need to know about participating in this pre-Enterprise Connect hackathon.
March 3, 2017
From protecting against new vulnerabilities to keeping security assessments up to date, security consultant Mark Collier shares tips on how best to protect your UC systems.
February 24, 2017
UC analyst Blair Pleasant sorts through the myriad cloud architectural models underlying UCaaS and CCaaS offerings, and explains why knowing the differences matter.
February 17, 2017
From the most basics of basics to the hidden gotchas, UC consultant Melissa Swartz helps demystify the complex world of SIP trunking.
February 7, 2017
UC&C consultant Kevin Kieller, a partner at enableUC, shares pointers for making the right architectural choices for your Skype for Business deployment.
February 1, 2017
Elka Popova, a Frost & Sullivan program director, shares a status report on the UCaaS market today and offers her perspective on what large enterprises need before committing to UC in the cloud.
January 26, 2017
Andrew Davis, co-founder of Wainhouse Research and chair of the Video track at Enterprise Connect 2017, sorts through the myriad cloud video service options and shares how to tell if your choice is en....
January 23, 2017
Sheila McGee-Smith, Contact Center/Customer Experience track chair for Enterprise Connect 2017, tells us what we need to know about the role cloud software is playing in contact centers today.