Computer Vision: What We See for Business Video CallingComputer Vision: What We See for Business Video Calling

While consumer video calls are about addition, their enterprise counterparts are probably more about subtraction.

October 7, 2018

3 Min Read

If you look at the use of computer vision in video calling, you'll see a distinct difference between consumer and business examples. With consumers, computer vision is about additions to the video, whereas in the enterprise it's about subtracting from the video. More on that in a second... I am getting ahead of myself on this one.

Back to machine learning and real time.

What do you do with machine learning and artificial intelligence (AI) in real-time video calls? That was a big question for us when we set out to analyze AI in real-time communications recently. Last week, I described the various vendor archetypes we found on the market. This time, I want to take a peek at the computer vision component of our research.

Here's the challenge: They say an image is worth 1,000 words, and a video quite a bit more. There's a lot of data to process to do anything substantial related to computer vision on video. And doing that on real-time video is even harder. This is probably why most of the use cases we see today around machine learning in real-time communications is leaning toward voice and text-to-speech scenarios.

And still. There are things you can do with video. Things we've identified in our research. One thing we immediately identified is the machine learning-based filters you have today in messaging services. Snapchat, Instagram, and Facebook Messenger are leading examples -- you put silly hats on people, add stuff on the video once you find the location of the person's face, and you're done.

As fun as this is, it isn't something you'll likely see when trying to work with a business partner in an important video conference.

What would work then?

Many things. One of them would be the opposite of adding and enriching an image with virtual objects. It would be subtracting content, like filtering the background. This is exactly what Microsoft has introduced to Microsoft Teams: the ability to blur the background. And what better example to show it on than the poor BBC interviewee from last year?

null
null

That can be quite useful for conferences. We've identified nine different use cases where computer vision can be of use to real-time communications. Silly hats and image enhancements are two of them.

What was interesting in all this is that we are still very early on in where computer vision finds its place. Different vendors have invested in different areas: Microsoft went for background blurring while Facebook went for silly hats. Some use cases will get to market faster than others.

The endgame? Maybe getting AI to synthesize most of the video out of thin air out of the characteristics of what needs to be seen. There are things you can do today with computer vision and real-time communications. It doesn't need to be super complicated and computationally intensive to bring value. The idea is to identify the benefit of a certain algorithm/capability and then apply it in the right context.

Looking to learn more about computer vision and its place in real-time communications? Interested in how machine learning fits into your strategy? Check out our report on AI in real-time communications.

For the first piece in this ongoing series, read "Machine Learning: Coming to a Communications Service Near You," and stay tuned for more on topics around speech analytics, voice bots, computer vision, and quality optimization. If you want to learn more, contact us at [email protected].

About the Author

Tsahi Levent-Levi

Tsahi Levent-Levi is an independent analyst and consultant for WebRTC.

Tsahi has over 15 years of experience in the telecommunications, VoIP,and 3G industry as an engineer, manager, marketer, and CTO. Tsahi is an entrepreneur, independent analyst, and consultant, assisting companies to form a bridge between technologies and business strategy in the domain of telecommunications.

Tsahi has a master's in computer science and an MBA specializing in entrepreneurship and strategy. Tsahi has been granted three patents related to 3G-324M and VoIP. He acted as the chairman of various activity groups within the IMTC, an organization focusing on interoperability of multimedia communications.

What Tsahi can do for you:

Show you how to take your company to the forefront of technology
Connect you to virtually anyone in the industry
Give you relevant, out-of-the-box advice
Give you the assurance and validity you are looking for

Tsahi is the author and editor of bloggeek.me,which focuses on the ecosystem and business opportunities around WebRTC.

See more from Tsahi Levent-Levi

Related Topics

Recent in AI & Automation

Related Topics

Recent in Infrastructure

Related Topics

Recent in Digital Workplace

Related Topics

Recent in Data Management

Related Topics

Recent in Contact Centers

Related Topics

Computer Vision: What We See for Business Video CallingComputer Vision: What We See for Business Video Calling

null

About the Author

Editor's Spotlight

Related Topics

Recent in AI & Automation

Related Topics

Recent in Infrastructure

Related Topics

Recent in Digital Workplace

Related Topics

Recent in Data Management

Related Topics

Recent in Contact Centers

Related Topics

<span class="ArticleBase-LargeTitle">Computer Vision: What We See for Business Video Calling</span>Computer Vision: What We See for Business Video CallingComputer Vision: What We See for Business Video Calling

null

About the Author

Editor's Spotlight

Computer Vision: What We See for Business Video CallingComputer Vision: What We See for Business Video Calling