TechBriefly
  • Tech
  • Business
  • Crypto
  • Science
  • Geek
  • How to
  • About
    • About TechBriefly
    • Terms and Conditions
    • Privacy Policy
    • Contact Us
    • Languages
      • 中文 (Chinese)
      • Dansk
      • Deutsch
      • Español
      • English
      • Français
      • Nederlands
      • Italiano
      • 日本语 (Japanese)
      • 한국인 (Korean)
      • Norsk
      • Polski
      • Português
      • Pусский (Russian)
      • Suomalainen
      • Svenska
  • FAQ
    • Articles
No Result
View All Result
 Hot Topics:
  • Counter-Strike 2
  • Snapchat planets order
  • Microsoft AI copilot
  • ChatGPT plugins
  • Binance WOTD answers (Fan Tokens)
TechBriefly
No Result
View All Result
Home Tech AI

Meet Microsoft VALL-E: Text-to-speech sibling of DALL-E

Be careful because there is a very realistic sounding AI out there!

by Emre Çıtak
11 January 2023
in AI
Reading Time: 3 mins read
Microsoft VALL-E
Share on FacebookShare on Twitter

Microsoft’s take on text-to-speech synthesis, Microsoft VALL-E has been announced in a paper published by the company. The audio model only requires a 3-second audio file to process the given input.

Microsoft VALL-E, a novel language model approach for text-to-speech synthesis (TTS) that leverages audio codec codes as intermediate representations, was just launched by Microsoft. It was pre-trained on 60,000 hours of English speech data and then displayed in-context learning abilities in zero-shot circumstances.

Microsoft VALL-E
Microsoft VALL-E is a language model approach for text-to-speech synthesis

Microsoft VALL-E can produce high-quality personalized speech with just a three-second enrolled recording of an oblique speaker acting as an acoustic stimulus. It does so without the need for additional structural engineering, pre-designed acoustic features, or fine-tuning. It supports contextual learning and prompt-based zero-shot TTS approaches. It appears that the scaling up of semi-supervised data for TTS has been underutilized because Microsoft has used a significant amount of semi-supervised data to construct a generalized TTS system in the speaker dimension.

What can you do with Microsoft VALL-E?

According to the researchers, Microsoft VALL-E is a “neural codec language model” that was trained using discrete codes that were “derived from a pre-existing neural audio codec model.” It was trained on 60 thousand hours of speech, which is “hundreds of times greater than existing systems,” according to the statement. These examples are convincing in contrast to prior attempts that are very obviously robots, even though AI has been around for a while that can realistically mimic human speech.

Microsoft VALL-E
Microsoft VALL-E was trained on 60 thousand hours of speech

Microsoft VALL-E can “preserve the speaker’s emotion and auditory environment,” according to the researchers, of the prompt. Although it is impressive, technology is still a long way from replacing voice actors because finding the appropriate tone and emotion during a performance is different. Even an advanced version of Microsoft VALL-E wouldn’t be able to perform as well as a skilled professional, yet businesses often prioritize cost-effectiveness over quality.

On Microsoft’s GitHub demo, you can listen to some of the samples.

Microsoft VALL-E features

Although Microsoft VALL-E is very new, it already has many features.

Synthesis of diversity: Because Microsoft VALL-E generates discrete tokens using the sampling-based technique, its output varies for the same input text. It may therefore synthesize different personalized speech samples using a variety of random seeds.

Acoustic environment maintenance: Microsoft VALL-E can provide customized speech while maintaining the acoustic environment of the speaker prompt. In comparison to the baseline, VALL-E is trained on a big dataset with more acoustic variables. The audio and transcriptions were produced using samples from the Fisher dataset.

Microsoft VALL-E
Microsoft VALL-E can provide customized speech while maintaining the acoustic environment of the speaker prompt

Speaker’s emotion maintenance: Using the Emotional Voices Database as a resource, for example, audio prompts, Microsoft VALL-E may create customized speech while maintaining the emotional tenor of the speaker prompt. Traditional approaches train a model by correlating the speech to transcription and an emotion label in a supervised emotional TTS dataset. VALL-E can keep the emotion in the prompt even in a zero-shot situation.

Microsoft VALL-E still has issues with model structure, data coverage, and synthesis robustness.

How does Microsoft VALL-E work?

Microsoft used LibriLight, an audio library put together by Meta, to train VALL-voice E’s synthesis skills. Most of the 60,000 hours of English-language speech are taken from LibriVox public domain audiobooks and spoken by more than 7,000 people. The voice in the three-second sample must closely resemble a voice in the training data for VALL-E to get a satisfactory result.

Microsoft VALL-E
7,000 different people have helped Microsoft VALL-E to be build

Microsoft offers dozens of audio examples of the AI model in action on the VALL-E example page. The “Speaker Prompt,” one of the samples, is the three seconds of audio that VALL-E is instructed to mimic. The “Ground Truth” is a previously recorded excerpt from that speaker that is used as a benchmark (sort of like the “control” in the experiment). The “VALL-E” sample is the output from the VALL-E model, and the “Baseline” sample is an example of synthesis produced by a traditional text-to-speech synthesis approach.

While Microsoft VALL-E made history as the first, but certainly not the last, major AI project of 2023, the technology giant financially supported OpenAI Point-E, which was published in the last weeks of 2022.

 

 

Tags: featuredMicrosoft VALL-Etext-to-speech

Related Posts

AI whisperer jobs, aka prompt engineers, are on the rise

AI whisperer jobs, aka prompt engineers, are on the rise

Bing AI ads

The challenge of Bing AI ads is maximazing the user experience

Stop GPT-4: Musk and Woz ask for AI break

Stop GPT-4: Musk and Woz ask for AI break

Goldman Sachs: Generative AI threatens 300m jobs

Goldman Sachs: Generative AI threatens 300m jobs

POPULAR

Binance Word of the Day answers: Fan Tokens theme

What is Snapchat planets order?

What is Instagram direct message suggested list order (explained)?

How to hide retakes on BeReal?

RCM Loader for Nintendo Switch: What is it, how can you install?

What does setting interrogation succeeded mean?

How to hide blue ticks on WhatsApp?

Binance Word of the Day answers: Technical Analysis theme

Forza Horizon 5 Rally Adventure not working: How to fix it?

Should I update to iOS 16.4: Problems and new features

RSS News Republic

  • Pepsi new logo has been introduced and it is bold
  • The Last of Us building shaders error: How to fix it?
  • When is the next Steam sale: Dates and contents
  • Erin Darke and Daniel Radcliffe announce they are expecting a baby
  • What does IMY mean, and how to use it?

RSS Digital Report

  • Using Voice of the Customer for marketing and its benefits
  • Creating estimations for cost and organic traffic for your future SEO endeavors
  • Biggest issues plaguing the blockchain in 2023
  • What is the “Framing Effect” in marketing and how to use it?
  • How does in-house SEO compare to utilizing agencies and how to get started with it?

RSS Latest from LeaderGamer

  • Resident Evil 4 Remake system requirements – how many GB?
  • Wordle TR 1 Nisan 2023 günün cevabı
  • Wordle TR 30 Mart 2023 günün cevabı
  • Wordle TR 31 Mart 2023 günün cevabı
  • What are the Resident Evil 4 Remake difficulty levels?
TechBriefly

© 2021 TechBriefly is a Linkmedya brand.

  • Tech
  • Business
  • Science
  • Geek
  • How to
  • About
  • Privacy
  • Terms
  • Contact
  • FAQ
  • | Network Sites |
  • Digital Report
  • LeaderGamer
  • News Republic

Follow Us

No Result
View All Result
  • Tech
  • Business
  • Crypto
  • Science
  • Geek
  • How to
  • About
    • About TechBriefly
    • Terms and Conditions
    • Privacy Policy
    • Contact Us
    • Languages
      • 中文 (Chinese)
      • Dansk
      • Deutsch
      • Español
      • English
      • Français
      • Nederlands
      • Italiano
      • 日本语 (Japanese)
      • 한국인 (Korean)
      • Norsk
      • Polski
      • Português
      • Pусский (Russian)
      • Suomalainen
      • Svenska
  • FAQ
    • Articles