The Future of E-Reading: Integrating AI Voice Cloning for Multilingual Audio Content

August 3, 2024

36

For years people globally have dreamed of being able to have popular audio material available in local languages. Audio material has typically only been widely available in the language of original creation, and translating efforts have been costly and rare.

Beyond this, when audio material does get translated, it is often done in a very generic-sounding voice that doesn’t match the intended tone of the book (article, etc) at all.

Well, thanks to groundbreaking new AI technology, this is all changing now. Because of advancements in artificial intelligence, audiences everywhere are now able to enjoy authentic-sounding audio content in any number of languages. It is truly revolutionizing the world of audio material.

Benefits of AI voice cloning for e readers

Before we get started on the technical details of audio technology for e-reading, it stands to consider what exactly the benefits of audio cloning are for people globally. These benefits include:

Translating into other languages with an AI voice generator

The most obvious benefit that is to be gained from voice actor cloning for e-reading is to bring the world of audio books to different language groups. In the past, people who are not native speakers of a given language have had to adjust to the sound and emotions of whatever language audio books come in.

Theoretically the process of cloning voices for e-reading can apply to any language group. Although it is less likely that programmers will make the effort for some of the world’s really obscure languages, it can be done. And this can help to make people in other countries feel that they are a part of the global readership.

Giving people a feel for the author’s intention with AI voices

Consciously or not, people are simply less interested in material that they don’t feel speaks to them directly. Even if an audio version of a book is available in a foreign language, until recently the sound of the voice would likely be a flat, generic one that tion would have been flat, standardized, and likely nowhere near what the original author would have wanted it to be.

Now, though, thanks to voice cloning technology this is all changing. Voice cloning technology can replicate both the precise sound of a given author’s voice, or even another type of voice, to give books the sound and emotion that people in different language groups want from them.

Awareness building

When e-books are able to reach greater segments of the global population, it helps to cement the names of the authors in the societies where the books have been translated. This makes the authors become parts of those societies in ways that would never previously have been possible without this technology.

The larger effect that results from this is that authors are able to grow their reputations much more easily than they would have otherwise. Although there have always been a few select authors whose works are so famous that they are beloved globally, these people are in the vast minority. Now, gaining global popularity is possible for many more writers.

Inclusion for the visually impaired

Beyond reaching people who simply choose to listen to e-books because they prefer them to their text-based counterparts, audio production for the visually impaired is critical to reaching these audiences. And for these people who have to rely on audio for everything that they do, it is especially important to create authentic-sounding voices.

With the ability to clone voices and translate them into other languages, the visually impaired are now able to envision the things that authors write about with far greater accuracy.

This is especially important when it comes to e-learning. For people who need educational materials available in audio form, it is essential that they be authentic and realistic sounding so as to portray information correctly

How does the technology work?

The ability to produce audio content in different languages and custom voices in realistic speech is possible thanks to sophisticated voice generation AI tools, such as the Rask AI video translator.

Text-to-speech technology

When people read a book in text form, they create an idea in their minds of what the author’s voice would sound like.

Tools use a number of different technologies to make this possible. One of them is text-to-speech technology. As the name suggests, this technology converts text into AI speech sounds that sound almost exactly like a human voice when reading text aloud.

Creating your own audio with new apps

Another benefit of AI voice cloning is the ability to create audio in whatever way you choose. There are apps available now that will allow you to insert text and then choose from an array of options with regard to numerous aspects of speech.

Language and dialect

The most basic feature of these apps is language choice. Some apps are capable of producing sound for multiple different language groups, including some relatively obscure ones. People from minority language groups no longer have to rely on a colonial language or global language such as English to make text accessible for them.

Once a user selects a language from these text-to-speech API apps, they are sometimes given the further option to choose from different dialects. This can make a very important difference in the way a given text sounds, after all. If you want to produce audio content about the Wild West, simply being able to produce it in English is not enough. If what comes out is old-style British English, the text will lose a lot of its original meaning.

Age, gender, and emotion

One of the problems with old-style audio books is that they are often narrated by one man with a generic-sounding voice. The ability to choose the gender of an audio clip is important because it makes a big difference in how the text is portrayed.

Similarly, the “age” of the voice makes a big difference. If you are producing an audio clip of a children’s story, you can’t have the same type of voice that you would have for an adult romance novel.

You can also bring different kinds of emotion into your audio clips. One of the biggest criticisms of traditional audio books is that they have tended to be extremely monotone. With some of the more sophisticated apps, you can choose from a wide range of emotions and other advanced features to create your audio content in. Sound effects and other features are also often possible

Challenges being faced by the industry

For all the benefits that they provide, there are also a fair number of challenges that AI voice cloning faces. These need to be considered and properly addressed in order for the industry to move forward responsibly.

Consent

One of the biggest concerns across the voice cloning world – both in AI narrated audio books and other types of audio content – is author consent. AI can do a remarkably good job of reproducing people’s voices, but this does not necessarily mean that the authors in question actually want their voices reproduced.

Deep fakes

In the worst-case scenarios, audio text will be able to produce fake voices for people who did not actually write the text that is being read. This can result in material that is inauthentic and can harm people’s reputations.

Translating accurately

The ability to translate into other languages – even in text form – can be extremely challenging for a number of reasons. Beyond the literal translation of words themselves, translators struggle to find the right kind of phrases, tone, etc for books in order to preserve the author’s own voice style and original intention.

This challenge is further complicated when it comes to the question of translation. Not only is it difficult to gauge the tone of an author in foreign languages; in many cases it isn’t even technically possible. If a Chinese author writes a book about life in rural China and the book gets voice cloned into French, the result might be something that sounds very beautiful but is not at all what the author intended.

How to address these issues

The issues mentioned above are serious but not impossible to address. There are specific things that need to be done to maintain integrity in the industry.

Collaboration with experts

When books are translated into other languages, publishing companies almost always seek native speakers for the languages they translate into. The same needs to be the case for audio books.

Voice cloning producers need to work closely with native speakers of other languages to test and verify not only the language that is used in translations, but the tone, emotion, speed, and everything else that goes into a given audio translation.

Specific, enforceable regulation

As with other industries that use biometric material to use personal data, regulators need to create laws that govern the use of voice cloning. There need to be specific provisions created for consent and copyright issues, and they need to be strictly enforced. This can be a major challenge considering that authorship is global, and national governments can only do so much to control what happens in other countries.

Multi-factor authentication

Again, like other biometric-based technologies, there should be different levels of authentication for people to create AI voices. This makes it much more difficult for people to create undesirable copies of voices that remain in the public realm.

Conclusion

The future of e-reading with the inclusion of AI voice cloning technology is very promising. With the help of these tools, authors and publishers will be able to reach much wider audiences, and speak to people more effectively than ever before. Like many other new technologies, creators should be cautious to respect the rights of authors and to preserve the integrity of their works as much as possible. Governments and publishers also need to do their part to ensure that translations and voice cloning is conducted in a responsible manner.

Markus lives in San Francisco, California and is the video game and audio expert on Good e-Reader! He has a huge interest in new e-readers and tablets, and gaming.