Photographs are made using light, but what if portraits of people could be made with the sound of their voice? Well, researchers in artificial intelligence have worked on reconstructing a person’s face from a short audio recording of their voice, and the results are stunning.
Artificial intelligence scientists at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) first published a paper on an algorithm called “Speech2Face” in 2019. The researchers first designed and trained a neural network depth using millions of videos from YouTube and the Internet showing people talking. During this training, the artificial intelligence learned correlations between the sound of voices and the appearance of the person speaking. These links allowed him to guess the age, sex and ethnic origin of the interlocutor.
“To what extent can we deduce a person’s appearance from the way he speaks? This is the question we attempted to answer by studying the task of reconstructing a person’s facial image from a short audio recording of that person speaking. can we read in the study report. No human intervention was needed in the learning process, as the researchers did not have to manually save the databases. The algorithm simply received a huge amount of video and was tasked with making correlations between voice and facial characteristics.
Once trained, she proved to be remarkably skilled and efficient at creating portraits based solely on voice recordings, which often ended up looking very much like the person speaking. To further analyze the accuracy of facial reconstructions, scientists built a ‘face decoder’ that creates a standardized reconstruction of a person’s face from a still image disregarding variations such as pose. and lighting. This made it easier for scientists to compare voice reconstructions with the actual characteristics of each physique. And again, the AI results were surprisingly close to real faces in a large percentage of cases.
An application still perfectible
Despite the impressive abilities of Speech2Face, the algorithm could be further improved in the coming months. Indeed, he showed some weaknesses. In some cases, the artificial intelligence had difficulty determining what the speaker might look like. Factors such as accent, spoken language, and voice pitch caused speech-to-face mismatches in which gender, age, or ethnicity were incorrect.
For example, males with a particularly high-pitched voice were often identified as female, while females with a deep voice were seen as male. Thus, this misperception gave less accurate results such as when an Asian male speaking English appeared less Asian than when speaking Chinese. “In a way, the system is a bit like your racist uncle. He feels like he can always tell a person’s race or ethnicity based on the way they speak, but he’s often wrong.” explained photographer Thomas Smith.
That being said, the capabilities of the app are already insane!
We want to give thanks to the author of this write-up for this awesome material
Thanks to artificial intelligence, this application can make an accurate portrait using only your voice
You can find our social media pages here and additional related pages here.https://www.ai-magazine.com/related-pages/