AI can sing from speech samples now

Scientists created a new neural network, that can use AI to sing from speech samples. Chinese developers’ algorithm can synthesize a recording of a person’s singing based on a recording of the person’s usual speech, or perform it the other way around and synthesize speech based on singing. An article describing the development, training and testing of an algorithm has been published at arXiv.org.

In recent years, the development of neural network algorithms for speech synthesis, such as WaveNet , has allowed the creation of systems that are difficult to distinguish from real people. For example, in 2018, Google showed a voice assistant for booking seats that can not only speak realistic, but also insert human sounds that make speech verifiable, for example, “um”. As a result, the company also had to teach the algorithm to warn at the beginning of a conversation that it is not a person.

As in the case of other neural network algorithms, the success of speech synthesis systems is largely not related to their architecture, but mostly to the large amount of available data for training. Creating a system for synthesizing singing is a seemingly similar task, but in fact it is much more complicated due to the significantly lower amounts of available data.

Many developers working on singing geneating systems have recently taken the path of reducing the volume of singing samples to teach the algorithm, and now a group of Chinese researchers led by Dong Yu from Tencent have created a system that can create realistic singing audio recordings from speech samples.

https://www.youtube.com/watch?v=AnazWGADtnk

The algorithm is based on Tencent’s previous development, the DurIAN neural network, designed to synthesize realistic videos with a talking presenter based on text. Now they put a new speech recognition unit in front of DuarIAN, which creates phonemes based on the audio sample.

The authors trained the algorithm on two proprietary datasets consisting of one and a half hours of singing and 28 hours of speech. After training, they tested the algorithm on 14 volunteers who evaluated the realism of synthesized singing and the similarity. As a result, one of the tests got 3.8 points in realism and 3.65 in similarity. The authors published samples of the work of the neural network.

AI can sing from speech samples now

Florian Maximiliano

Related Posts

NASA’s James Webb Space Telescope discovers PSR J2322-2650b

NASA confirms 3I/ATLAS is an interstellar visitor behaving unlike anything in our system

SpaceX launches 84 Starlink satellites in five days

AI reasoning models’ carbon footprint varies greatly

LATEST

How to download free ringtones from Zedge

How to force stop apps and services on Android

Simple steps to install the latest Android patches

Paramount sues Warner Bros. over Netflix deal disclosures

Apple chose Google Gemini for Siri

Apple paid developers $550B since App Store launch

Amazon reveals 97% of shipped devices support Alexa+

Xiaomi to launch fully self-developed smartphone in 2026

New WhatsApp parental controls will block strangers

Galaxy Unpacked 2026: S26 Ultra arrives just before MWC

© 2021 TechBriefly is a Linkmedya brand.

AI can sing from speech samples now

Related Posts

LATEST

© 2021 TechBriefly is a Linkmedya brand.

Follow Us