Google’s artificial intelligence lab DeepMind is taking artificial intelligence-created video content one step closer, while traditional film and television production (not to mention simultaneous licensing) is closer to becoming obsolete.
DeepMind said in a blog post published on Monday (June 17) that it is developing “video-to-audio” (V2A) technology to pair music, sound effects and even dialogue created by artificial intelligence with video generated by artificial intelligence.
DeepMind writes: “Video generative models are advancing at an incredible pace, but many current systems only produce silent output.”
“One of the next major steps in bringing the resulting movies to life is to create a soundtrack for these silent videos.”
DeepMind says its technology stands out from other projects adding sound to AI-generated videos because “it can understand raw pixels” and, while users can give it text prompts, it’s not actually necessary – A.I. The technology can figure out on its own what kind of sound is appropriate for a given video.
DeepMind says the technology can also automatically synchronize sound with images (goodbye, sound editor, we don’t need you).
The DeepMind blog offers a number of video clips with text prompts for the sounds added to the video, including the soundtrack (prompt: “movie, thriller, horror, music, tension, atmosphere, footsteps on concrete” ), underwater scene (prompt): “Jellyfish pulsating underwater, sea life, ocean”), and a man playing guitar (see picture below):
“Preliminary results suggest that this technology will be a promising method for bringing generated movies to life,” the DeepMind blog said.
The lab said the technology was trained on audio, video and transcripts of spoken conversations and was enhanced with “artificial intelligence-generated annotations and detailed descriptions of sounds.”
It is worth noting that the laboratory did not disclose whether the audio, video and transcripts are protected by copyright, nor whether the materials have been licensed for use in artificial intelligence training. It only pointed out that DeepMind is “committed to responsible development and deployment.” Artificial Intelligence Technology”.
“Preliminary results suggest this technology will be a promising way to bring generated movies to life.”
Google deep thinking
Google’s approach to AI training and copyright has been difficult to parse. Although the company’s YouTube unit has teamed up with major record labels to build AI music tools with the support of artists, Google also told the U.S. Copyright Office last year that the use of copyrighted material in training AI should be considered fair use.
Right now, V2A technology doesn’t appear to be ready for prime time—that is, it hasn’t been released to the public yet.
“We are working to address a number of other limitations and further research is ongoing,” DeepMind said.
One area that the lab says needs improvement is the generation of voice dialogue. Current iterations of V2A technology “often result in[s] Incredible lip synchronization because the video model does not produce mouth movements that match the transcript,” DeepMind said.
In addition, audio quality degrades when the video input contains “artifacts or distortions” for which V2A technology has not been trained, DeepMind said.
Nonetheless, it’s clear that audio-to-video technology like this is the missing link in using artificial intelligence to create real-time, complete audiovisual content.
Amid the ongoing artificial intelligence craze, many developers are working on sound generation technology. For example, earlier this month, Stable artificial intelligence Released Stable Audio Open, a free and open source model that allows users to create high-quality audio samples.
While it’s not suitable for creating full-length music tracks, it can create clips up to 47 seconds long that include sound effects, drum beats, instrumental riffs, atmospheres, and other production elements commonly used in music and sound design.
The past few months have also seen the release of artificial intelligence video creation tools that are capable of producing incredibly realistic videos, including Open artificial intelligenceThe film went viral this spring for its compelling images of people, animals and landscapes.
Soon, other AI video generators appeared, all vying for the title of “Sora Killer” and all hailed by some as the best yet: Good brightness‘Dream machine, trackGen-3 Alpha, and more recently Chinese video platforms quick workerIt’s Kling.
With photorealistic AI video generation now in the hands of users, the question of deepfakes is becoming increasingly pressing — which may explain part of why Google’s DeepMind has been reluctant to release its latest technology, which (when perfected ) will be able to add realistic sound effects and vocals to videos created by artificial intelligence.
DeepMind noted on its blog that it has integrated its SynthID tool into its V2A product. SynthID is a technology that adds a digital watermark to content created by artificial intelligence, making it identifiable as a product of an artificial intelligence tool.
DeepMind also acknowledges audiovisual creators who may be out of work due to these new AI tools.
The blog states: “To ensure our V2A technology can have a positive impact on the creative community, we are gathering diverse perspectives and insights from leading creators and filmmakers and using this valuable feedback to inform our ongoing research and development. Provide information.”global music business