How AI Powers Your Video Conferencing Without You Knowing

Optimization of image and sound quality, automatic translation, creation of shared virtual spaces… Artificial intelligence improves the user experience.

For nearly two years now and the generalization of telework, professionals around the world have been chaining videoconferences from morning to night. Repeated bugs, feedback, pixelated images, at the start of the health crisis they had to deal with solutions that did not offer all the working comfort expected for intensive use. Since then, the main market players such as Zoom, Microsoft, Google and Cisco Web have highly professionalized their tools by making extensive use of AI technologies. From optimizing image and sound quality to improving user experience, their approaches offer great similarities.

Audio: reduce extraneous noise

The first contribution of AI? Sound optimization. What could be more disturbing than hearing a participant typing on the keyboard, the noisy environment of an open space or the blower of the air conditioning in a meeting room. From algorithms of deep learning reduce these parasitic noises by eliminating all the punctual or continuous sounds which leave the spectrum of frequencies of the human voice.

“AI helps to focus the audio stream on the person speaking or to harmonize the voice volume of the participants”

“This sound optimization applies to both individual workstations and meeting rooms,” says Xavier Hemery, head of collaboration technical expertise collaboration architecture at Cisco. “AI also makes it possible to focus the audio stream on the person speaking or to harmonize the voice volume of participants, regardless of their distance from a conference call device.”

To strengthen its expertise on the subject, Cisco acquired, in August 2020, BabbleLab, an American specialist in audio experience.

1649438304 220 How AI Powers Your Video Conferencing Without You Knowing
WebEx AI equalizes all participants’ voices, regardless of their distance from the video conferencing device. It also differentiates a speaker’s speech from background noise. © Cisco

Videoconferencing solutions also use voice assistants to dial a number or start a meeting. Zoom offers its own personal agent while supporting connected devices from Google Nest or Amazon Alexa. Microsoft, for its part, uses its in-house assistant, Cortana, to manage the voice control of Teams-approved devices in meeting rooms. The assistant of Cisco WebEx takes up the notion of skills, dear to Amazon, to interact with third-party systems.

Video: quality for all

After the sound, place to the image. The AI ​​must guarantee the user the best video quality regardless of their equipment and the quality of the network. Video stream compression and optimization algorithms come into play to compensate for any technical hazards. “To save bandwidth, the video stream focuses on the people and not on the static background,” explains Francois Familiari, senior sales engineer at Zoom.

How AI Powers Your Video Conferencing Without You Knowing
Zoom’s “smart gallery” function. © JDN / Capture

Google Meet proposes, for its part, to automatically adjust the brightness of the image if the environment is poorly lit. Its Autozoom function allows, as its sound suggests, to zoom in on the user’s face if the AI ​​judges that it is positioned too far from the camera. In a meeting room, facial recognition will frame the face of the speaking participant to better capture their expressions. Video tracking also makes it possible to follow him if he moves around the room.

“Facial recognition can identify a user, even if they have a mask, based on the company directory”

“Facial recognition will be able to identify a user, even if he is wearing a mask, based on the company directory. This will prove interesting in an international context where the interlocutors do not always know each other”, completes Xavier Hemery. As part of the health protocol, counting the people present in the room also makes it possible to check compliance with the gauges.

Other algorithms ensure the outline of the bodies to restore them on virtual backgrounds while filters can deck out the faces with a pair of glasses or a virtual beard. Zoom even offers the “touch up my appearence” function which smoothes the skin of the face in order to appear in its best light. In the same spirit of gamification, pattern recognition can automatically display an emoticon associated with a gesture such as a thumbs up to approve a statement, or a raised hand to ask for the floor.

In a work organization in hybrid mode, AI also plays a key role. Paradoxically, participants who have made the effort to come to the site are de facto disadvantaged. They appear in the same video stream, sometimes reduced to pinheads if they are numerous in the image. The function called “people focus” at Cisco Webex or “smart gallery” at Zoom corrects the problem. It “cuts out” the participants from the room and then puts them each back in an individual thumbnail as if they were behind their PC.

How AI Powers Your Video Conferencing Without You Knowing
Zoom ‘immersive view’ function. © JDN / Capture

In the same spirit, another view, called “immersive view” in Zoom and “together mode” in Microsoft Teams, brings all the participants together in the same virtual plane, like a classroom or an amphitheater. Another function of Teams: the “dynamic view” dynamically organizes the display between the thumbnails of the speakers and the content they share.

Automatic translation and note taking

Videoconferencing platforms are set to become real towers of Babel by allowing participants to choose the common language of the meeting (typically English), which will be subtitled in each person’s mother tongue. Last September, Zoom announced support for a dozen languages ​​initially and then around 30 for its automatic transcription and live translation services.

Technologies related to natural language processing also make it possible to identify the high points (or highlights) of a meeting from the detection of key words such as “decision” or “agenda”. A chaptering system that allows the user who is reviewing a recording to go directly to the passage that interests him. Integrated into a chat module, the AI ​​can in the same logic serve as a moderator by censoring inappropriate terms or confidential information in order to comply with the legal and regulatory framework.

To avoid zoombombing (i.e. the intrusion of a hacker or troll into a supposedly private videoconference), Zoom has developed an AI that continuously scans social networks to detect if the link of a session has not been shared publicly. “The administrator is immediately alerted”, specifies François Familiari at Zoom. “It’s up to him to see if this sharing is voluntary and, if not, if it reflects a possibility of intrusion. If so, he can remind users of the importance of using the password to secure access and activate the waiting room function.”

Waiting for the metaverse…

The future of videoconferencing could well go through the metaverse. Immersive universes that would make meetings more engaging and inclusive by erasing physical distance even more strongly. Microsoft and Cisco are both preparing changes to their solutions on the subject. For its part, Zoom announced last September a partnership with Oculus, owned by Meta. The publisher aims to bring its virtual whiteboard function to Horizon Workrooms, Facebook’s remote work meeting tool. Wearing the headset and the Oculus remote control, users will then be able to interact by gesture via the Zoom whiteboard.

1649438305 586 How AI Powers Your Video Conferencing Without You Knowing
Zoom’s Oculus feature will support gesture recognition. © JDN / Capture

We want to say thanks to the writer of this post for this outstanding material

How AI Powers Your Video Conferencing Without You Knowing

Explore our social media profiles as well as other related pages