Nvidia Unveils AI Model That Modifies Voices, Generates Novel Sounds

Nvidia’s Fugatto: Nvidia on Monday revealed a new AI model for generating music and audio that can modify voices and generate novel sounds – technology aimed at the producers of music, films and video games.

Nvidia’s Fugatto Modify Sounds

The tool, called Fugatto, is capable of generating music, sounds, and speech using text and audio inputs it’s never been trained on.

According to Nvidia, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice — even let people produce sounds never heard before.

Also Read: Nvidia Faces US Antitrust Probe After Rivals Complaint

Unique Properties of Fugatto

What makes Fugatto different from other AI technologies is its ability to take in and modify existing audio. For example by taking a line played on a piano and transforming it into a line sung by a human voice, or by taking a spoken word recording and changing the accent used and the mood expressed.

According to Reuters, “If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers,” said Bryan Catanzaro, vice president of applied deep learning research at Nvidia.

“I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things.”

Nvidia’s new model was trained on open-source data, and the company said it is still debating whether and how to release it publicly.