Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
It was criticized by Scarlett Johansson. It was delayed by more than a month. And now that it’s finally here, only a select few customers in an “alpha” group have access to the new ChatGPT Advanced Voice Mode from OpenAI, a more naturalistic, human-like audio conversational mode for the hit chatbot available through the official ChatGPT app for iOS and Android.
Yet, already, just days after the first alpha testers got their hands on ChatGPT Advanced Voice Mode, people are posting examples of it engaging in fantastically expressive and impressive utterances, impersonating Looney Toons characters and counting so fast it runs out of “breath” just like a human would.
Here are some of the more interesting examples we’ve come across shared by initial alpha users on X, with the caveat that we ourselves don’t have access to it yet so can’t verify the authenticity.
Language instruction and translation
Several users on X noted that popular language learning app Duolingo might be in trouble given that ChatGPT Advanced Voice Mode can perform interactive, “hands on” (or is that, “voice on”?) instruction custom tailored to an individual attempting to learn or practice another language.
Advanced Voice Mode is also powered by OpenAI’s new GPT-4o model, which is the company’s first natively multimodal large model, designed to handle vision and audio inputs and outputs without linking back to other specialized models for these media (unlike GPT-4, which relied on other domain-specific OpenAI models).
As such, Advanced Voice Mode can speak about what ChatGPT is able to see through the user’s phone camera if they grant the app access to it. In one example, McGill University mixed reality design instructor Manuel Sainsily posted how Advanced Voice Mode was able to use this capability to translate screens from a Japanese version of Pokémon Yellow for GameBoy Advance SP:
Humanlike utterances
Cristiano Giardina, an Italian-American AI writer, has posted a number of examples of tests with the new ChatGPT Advanced Voice Mode, including one viral demo where he shows how he can ask it to count up to 50 faster and faster. It dutifully does so, but even stops to catch its breath near the end.
Giardina later followed up with a post on X noting that the transcript of that counting experiment didn’t showcase any breaths, indicating ChatGPT’s Advanced Voice Mode “has simply learned natural speaking patterns, which includes breathing pauses. Uncanny.”
ChatGPT Advanced Voice Mode can also clear its throat and mimic applause, as seen in the below video on YouTube:
Beatboxing
Startup founder Ethan Sutin posted a video to X showing how he was able to get ChatGPT Advanced Voice Mode to beatbox fluidly and convincingly like a human MC:
Audio storytelling and roleplaying
ChatGPT can also roleplay (the SFW kind) if the user asks it to “play along” and invents a fictitious scenario such as going back in time to Ancient Rome, as University of Pennsylvania Wharton School of Business Ethan Mollick showed in a video posted to X:
If the user just wants to listen, they can ask ChatGPT Advanced Mode to tell a story, and it will do so complete with its own AI generated sound effects such as thunder and footsteps in this example taken from Reddit and reposted on X:
It can also reproduce the sounds of an intercom voice:
Mimicking and reproducing distinct accents
Giardina showed how ChatGPT Advanced Voice Mode can be used to mimic a vast variety of regional British accents:
…as well as impersonate a soccer commentator across languages:
Sutin showed how it can attempt to reproduce different U.S. regional accents including Bostonian, Cajun, Minnesotan/Midwestern, and Southern Californian, though to my Midwestern ear that one sounded almost more Japanese American:
And it can imitate fictional characters, too…
Finally, Giardina showed that ChatGPT Advanced Voice Mode not only knows and understands the difference between how different fictional characters speak, but can imitate them as well:
The alpha mode continues with OpenAI earlier promising that it would roll out to all paying ChatGPT Plus subscribers by the fall.
The real question is: what is this mode good for in a practical sense? Beyond fun and interesting demos and experiments, will it make ChatGPT more useful or appealing to a wider audience? Will it result in more audio-based scams? As the company expands access, we’re sure to find out.