Sponsored By

What Do You Buy with 5k and AI?What Do You Buy with 5k and AI?

As we move into a world of 5k video endpoints, what are we going to get for our money?

Brent Kelly

September 27, 2017

7 Min Read
No Jitter logo in a gray background | No Jitter

Cisco has been actively marketing video solutions with 4k and 5k high-resolution cameras conjoined with artificial intelligence (AI) capabilities. While other vendors have high-resolution cameras, I believe Cisco is leading the market with respect to the capabilities it offers using 4k/5k camera data today coupled with information from other room devices. In Cisco's case, those would be Spark Board and Spark Room systems.

Quite frankly, most people can't tell the difference between a 720p video image and a 1080p video image, much less a 4k image. So why does Cisco think we need even higher-resolution -- 4k or 5k -- cameras?

I'll focus first on what Spark devices do with the 5k camera data, then I'll ideate on what we'll see in the coming months and years, not just with camera data, but from these systems in general.

5k Data Now
Cisco supports 5k cameras in the Spark Board 55 and 70, as well as in the Spark Room 55, Spark Room Kit, and Spark Room Kit Plus (with four 5k cameras). Today these systems primarily use 5k data to frame the video meeting properly:

  • The 5k camera data gives a wide view of the entire room. The systems (Spark Room today and Spark Boards soon) crop the image by framing active speakers in the field of view sent to the far side. The image size sent to the far side is still 1080p, not 5k.

  • The Spark Board has a 4k screen that avoids the "up close" pixilation one would otherwise see when a person on the far side is writing on the capacitive touch whiteboard.

AI in Spark Devices
The Spark devices use machine learning (ML), a branch of AI, to help them zoom in and out intelligently and to frame the images properly. ML algorithms process the 5k video images, allowing the systems to detect who is moving and who is speaking. While a speaker is in motion, the ML algorithms keep the speaker properly framed in the video sent to remote participants.

When multiple participants are speaking, Spark devices know to zoom out. During a meeting, Spark devices measure "ball possession," learning who is speaking most using a combination of visual recognition, voice recognition, and triangulation of voice location. In an experimental mode, Spark Board uses a "grid of dots," eye and nose placement, and other techniques to identify the speaker or speakers -- and then change the focus based on talking pattern.

For example, if one person is dominant, the system can make that speaker more prominent, based on the ML algorithm. If two people sitting across the table from one another dominate the conversation, the Spark device is intelligent enough to frame them in the video image it sends to remote participants, as opposed to ping-ponging back and forth among those two most active speakers.

The four-camera version of the Spark Room Kit Plus uses one of the 5k cameras to frame the entire room while the other three do the zooming and framing on portions of the room. The same algorithms referenced above come into play here.

Continue to next page for "Looking Toward Future Capabilities"

Continued from Page 1

Looking Toward Future Capabilities
As Apple does in its newly introduced high-end iPhone X, Cisco supports facial recognition for authentication. It's shipping facial recognition in Spark Boards now, but only as an experimental capability that a system administrator must turn on and train. Today it simply puts a name on the screen in front of a person recognized through facial recognition.

So, why is facial recognition in a video conference important? In the future, it may provide:

  1. Rosters of who was or is in a room at any given time.

  2. Visual sentiment analysis as opposed to simple voice sentiment analysis. This is important given the high percentage of communication contained in the visual cues we all give off. The system may be able to discern engagement level.

  3. An even easier way to enable meetings to start and end on time. As opposed to just an ultrasonic signal that the Spark Board sends to a smartphone to detect personal presence, facial recognition allows the system to authenticate a user visually. Coupled with the smartphone app, or even a lowly passcode, it can enable two-factor authentication. It would also authenticate participants who have forgotten to bring their smartphones to meetings, or it could authenticate authorized third parties or guest presenters.

Additional capabilities will be coming:

  • The use of intelligent assistants along with voice recognition will allow people to start and control a meeting using natural language like Scotty in Star Trek did when talking to a computer: "Computer, start my conference" or "Computer, call so-and-so." Cisco demoed this during its Enterprise Connect 2017 keynote, as shown in the screen capture below. Keynoter Jens Meggers, SVP and GM for Cisco's Cloud Collaboration Technology Group, launched a meeting via voice command to a virtual assistant. Notice the facial recognition at work: The system knows the faces of the three remote participants and superimposes their names on the video image.

  • The ability to "zoom in" on who is speaking at any given moment, coupled with AI and speech recognition, may allow a virtual meeting assistant to detect important decisions and zero in on a time stamp for them. This, in turn, can help to document what happened in important meetings, and allowing those not in attendance to "fast-forward" to the important parts when viewing the replay. This is similar to the IBM Watson Workspace "moments" capability that automatically summarizes team conversations and surfaces key action items. Down the road, I predict that the virtual assistant will be able to use body language to tell who is really engaged and supportive of meeting conclusions, and who isn't.

  • Virtual reality (VR) and augmented reality (AR) aren't really new, but if people can begin using the Spark Board framework, along with tools provided by Cisco and third parties, then these technologies can potentially become compelling and useful. At Enterprise Connect, Cisco demoed some VR/AR research it's doing along with Oculus and others. Note that previous VR attempts by Nortel/Avaya and IBM didn't take off, although a small set of users really liked these products.

While whiteboarding with Spark Board is interesting, I've not found it particularly compelling. Most of the whiteboards I've used while practicing as an engineer were far bigger than the Spark Board screens. Plus, whiteboarding can be done with a variety of other less expensive methods.

The combined image below illustrates my point: on the left is an image Cisco's Meggers drew during that keynote I'd mentioned above, while the one on the right shows a similar image created at Cisco Live on a touch-enabled tablet running OneNote displaying its screen on a Spark Board. Yes, I get that Spark Board is multiparty and OneNote is "mostly" single party, but other collaboration tools have multiparty whiteboarding that can display on a screen at the front of the room, and people don't even have to leave their seats or offices to participate. Thus, for me, digital whiteboarding is interesting, but not really groundbreaking.

Far more useful and compelling, I think, are the capabilities that come along with the intelligence, video resolution, and processing power Cisco is adding to Spark Board:

  1. Far better video meetings in which the cameras zoom and frame automatically (today)

  2. The ability to do facial recognition (in experimental mode today) and facial authentication (future)

  3. Counting the number of participants in a meeting and identifying who they are (today)

  4. Reading participant body language to help meetings become even more effective (future)

  5. Very smart and useful digital assistants that can summarize the important points in a video meeting and do visual sentiment analysis (future)

  6. The ability to speak to a meeting room technology and have it intelligently start meetings and otherwise act as an intelligent assistant (future)

  7. VR/AR embedded right into our communications experiences, as appropriate and useful (future)

These kinds of advances require hardware and software working together along with advanced computing techniques that bring AI via ML into every office and workspace. So, what will you buy with 5k and AI? A far more productive future enabled through machine-assisted communications and collaboration.

About the Author

Brent Kelly

Brent Kelly is a principal analyst for unified communication and collaboration within Omdia’s Digital Workplace team.

Since 1998, Brent has been the principal analyst at KelCor, Inc., where he provided strategy and counsel to CxOs, investment analysts, VCs, technology policy executives, sell-side firms, and technology buyers. He also provided full-time consultancy to Wainhouse Research and Constellation Research. With a PhD in chemical engineering, Brent has a strong data background in numerical methods and applied artificial intelligence with significant experience developing IoT and AI solutions.

Brent has a Ph.D. in chemical engineering from Texas A&M University and a B.S. in chemical engineering from Brigham Young University. He has served two terms as a city councilman in his Utah community. He is an avid outdoorsman participating in cycling, backpacking, hiking, fishing, and skiing. He and his wife own and operate a gourmet chocolates manufacturing company.