AI-Powered Speech Infuses a Personal Touch into Literature

Home AI Projects AI-Powered Speech Infuses a Personal Touch into Literature
ai audiobooks

A groundbreaking development in the area of audiobooks has emerged. A collaboration between researchers at Microsoft, MIT, and Project Gutenberg, a venerable repository of public domain literature predating the Internet, has given rise to a pioneering initiative. This initiative introduces lifelike AI-generated speech to a vast array of literary works, spanning from Randall Garrett’s “After a Few Words” to “Zut and Other Parisians.”


The concept of automated audiobook production is not novel; it has existed for quite some time. The unveiling of a novel generation of audiobooks, as detailed in the arXiv preprint “Large-Scale Automatic Audiobook Creation,” showcases a fresh approach. This approach ushers in a new level of authenticity with voices that emanate from the cutting-edge neural text-to-speech processes of today. Not only does it elevate the realism, but it also streamlines the production process, saving both time and costs.


Presently, public domain audiobooks often suffer from mechanical-sounding narrations. The new approach promises to infuse narrations with distinct emotional nuances.


Brendan Walsh, a software engineer at Microsoft, elucidates, “We employ an automated speaker and emotion-inference system to dynamically adapt the reading voice and tone according to the context.”


In this innovative approach, narration maintains a consistent voice, while character dialogues within the narrative are spoken in diverse voices. The neural inference system governs the tone and style of speech.


Walsh emphasizes, “This approach breathes life into passages featuring multiple characters and emotional dialogues, making them more captivating and lifelike.”


Users have the flexibility to customize the voice’s sound, pitch, speed, and intonation to align with their personal preferences.


The research team anticipates a forthcoming live demonstration that will empower the public to create audiobooks in their own voices. This process will necessitate only brief voice samples, which will be utilized to craft complete audiobooks.


An interesting precedent was set when DeepZen Ltd. leveraged generative AI technology to create audiobook narrations using samples of the late actor Edward Hermann’s voice, who passed away nearly a decade ago. The result was a seamless dialogue with natural intonation, virtually indistinguishable from recordings of the actor’s actual voice.


Project Gutenberg, already home to around 5,000 books totaling 35,000 hours of speech, offers free access to anyone interested in listening. They plan to introduce a feature that allows users to record their own audiobooks. Users can establish a voice profile by reciting a few sentences, after which Project Gutenberg will generate an AI-generated voice, promptly available for listening.


Users will also have the option to personally narrate a preface or dedication, followed by uploading the complete text of their book. Once completed, customers will receive an email containing a link to their newly generated audiobook.


In the near future, when a busy mother can’t read a bedtime story to her 7-year-old son due to work commitments, he can simply play his favorite audiobook and relish the comforting sound of his mother’s voice narrating thrilling tales.


Aspiring actors can swiftly create unique gifts for their friends by providing voice samples for various roles in Shakespearean plays, thus bringing the characters to life with their own voices.


Given legal collaboration with willing parties, individuals may soon have the opportunity to select from voices such as Taylor Swift, Arnold Schwarzenegger, or Morgan Freeman to narrate their very own novels.