Alexis “Lacy” Bogan’s voice was full of energy last summer.
She loved singing Taylor Swift and Zach Bryan ballads in the car. She was always laughing—even while corralling misbehaving preschoolers or debating politics with friends over the backyard fire pit. In high school, she was a soprano in the choir.
Then the voice disappeared.
In August, doctors removed a life-threatening tumor near the back of her brain. When the breathing tube was removed a month later, Bogan had difficulty swallowing and struggled to greet his parents. Months of rehab helped her recover, but she still has a speech impediment. Friends, strangers and her own family struggled to understand what she was trying to tell them.
In April, at the age of 21, she regained her former voice. Not a real voice, but an AI-generated voice clone that she could summon from an app on her phone. Trained on a 15-second time capsule of her teenage voice (from a cooking demonstration video she recorded for a high school project), her synthetic yet realistic-sounding AI voice can now say almost anything she wants to say.
She types a few words or sentences into her phone, and the app immediately reads them aloud.
“Hey, can I get a special Iced Brown Sugar Oatmeal Milkshake Espresso?” said Bogan’s artificially intelligent voice as he stuck his phone out the window of a Starbucks drive-thru.
Experts warn that rapidly improving artificial intelligence voice cloning technology could amplify phone scams, disrupt democratic elections and violate the dignity of people who have never agreed to recreate their voices to say things they never said, whether biologically or not. The person is still dead.
It was used to create deepfake robocalls imitating President Joe Biden to New Hampshire voters. In Maryland, authorities recently charged a high school athletic director with using artificial intelligence to generate false audio clips of the school principal making racist comments.
But Bogen and a group of doctors at the Rhode Island Lifespan Hospital Group believe they have found a use that justifies the risks. Bogan is one of the first people, and the only person with the condition, to be able to recreate lost sounds using OpenAI’s new speech engine. Several other AI vendors, such as startup ElevenLabs, have tested similar technology on people with speech impairments and incapacitation, including a lawyer who now uses a clone of her voice in court.
“As technology evolves, we want Lexi to be a trailblazer,” said Dr. Rohaid Ali, a neurosurgery resident at Brown University School of Medicine and Rhode Island Hospital. He said millions of people suffering from debilitating strokes, throat cancer or neurological diseases could benefit.
“We should be aware of the risks, but we cannot forget the patient and social benefits,” said Dr. Fatima Mirza, another resident physician involved in the pilot program. “We were able to help Lexi find her true voice and she was able to speak in the most authentic way.”
Mirza and Ali, who are married, came to the attention of ChatGPT maker OpenAI because of their previous research project at Lifespan using AI chatbots to simplify medical consent forms for patients. The San Francisco company reached out earlier this year as it looked for promising medical applications for its new artificial intelligence speech generator.
Bogan is still recovering slowly from the surgery. The illness, which began last summer with headaches, blurred vision and drooping face, alarmed doctors at Hasbro Children’s Hospital in Providence. They found a vascular tumor the size of a golf ball pressed against her brain stem and tangled with blood vessels and cranial nerves.
“It’s a battle to control the bleeding and remove the tumor,” said pediatric neurosurgeon Dr. Konstantina Svokos.
Swokos said the 10-hour surgery, combined with the location and severity of the tumor, damaged Bogan’s tongue muscles and vocal cords, hampering her ability to eat and speak.
“When I lost my voice, it was like a part of my identity was taken away,” Bogan said.
Feeding tubes came out this year. Speech therapy continues, enabling her to speak clearly in a quiet room, but there is no sign that she will fully regain the clarity of her natural voice.
“At some point, I started forgetting what my voice sounded like,” Bogan said. “I’m used to the sound of my voice now.”
Whenever the phone rang at her home in the Providence suburb of North Smithfield, she would hand it to her mother to answer. Whenever her friends went to noisy restaurants, she felt like she was a burden to them. Her father suffers from hearing loss and has difficulty understanding her.
Back at the hospital, doctors are looking for a trial patient to test OpenAI’s technology.
“The first person Dr. Swokos thought of was Lexi,” Ali said. “We reached out to Lexi to see if she would be interested, but didn’t know how she would react. She was happy to give it a try and see how it went.
Bogan had to go back several years to find suitable recordings of her voice to “train” the AI system on how she spoke. In this video, she explains how to make pasta salad.
Her doctor deliberately fed the 15-second clip into the artificial intelligence system. The sounds of cooking take away from the rest of the film. This is what OpenAI needs too – an improvement over previous techniques that required longer samples.
They also know that getting useful information within 15 seconds may be crucial for future patients who can’t find a trace of their voice online. Leaving a brief voicemail for a relative may be enough.
During the first test, everyone was stunned by the quality of the clone’s voice. The occasional glitches—a mispronunciation of a word, a loss of intonation—are mostly imperceptible. In April, doctors equipped Bogan with a customized mobile app that only she could use.
“Every time I hear her voice, I get so emotional,” her mother, Pamela Bogan, said with tears in her eyes.
“I think it’s awesome that I can hear that sound again,” added Lexi Bogen, who said it helped “boost my confidence to where it was before all of this happened.”
She now uses the app about 40 times a day and sends feedback in the hope of helping future patients. Her first experiment was talking to children in the kindergarten where she worked as a teaching assistant. She typed “hahahaha”, expecting the robot’s response. To her surprise, it sounded like her old laugh.
She uses it at both Target and Marshall’s to ask where she can find items. This helped her reconnect with her father. This makes it easier for her to order fast food.
Bogan’s doctors have begun replicating the voices of other volunteer patients in Rhode Island and hope to bring the technology to hospitals around the world. OpenAI says it is moving cautiously in expanding its use of the speech engine, which has not yet been made public.
Many smaller AI startups are already selling voice cloning services to entertainment studios or making them more widely available. Most speech generation vendors say they prohibit counterfeiting or misuse, but the way they enforce their terms of use varies.
“We want to make sure that everyone who uses voice in the service has ongoing consent,” said Jeff Harris, head of product at OpenAI. “We want to make sure that it’s not being used in a political context. So we’ve taken a very limited approach to that. To whom technology is provided.
OpenAI’s next steps involve developing a secure “voice authentication” tool so that users can only copy their own voice, Harris said. He said this could be “limiting for patients like Lexi who suddenly lose their ability to speak.” “So we do think we need to build high-trust relationships, particularly with healthcare providers, to allow for more unrestricted use of the technology.”
Bogan focused on thinking about how the technology could help others with similar or more severe speech difficulties, and her doctors were impressed.
“Part of what she did throughout the process was thinking about how to adjust and change that,” Mirza said. “She was a big inspiration to us.”
While she currently has to fiddle with her phone to get the speech engine to speak, Bogan envisions an artificial intelligence speech engine that could improve upon older speech restoration therapies—such as robotically voiced electrothroats or speech prostheses—either integrated with the human body or in real time. Translate words.
She’s not quite sure what to expect as she gets older, and her artificial intelligence voice still sounds like it did when she was a teenager. Perhaps the technology could “age” her artificial intelligence voice, she said.
For now, “even though my voice isn’t fully recovered, I have things that are helping me find my voice again,” she said.