Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label VALL-E. Show all posts

Microsoft Quietly Revealed a New Kind of AI


In the tangible future, humans will be interfacing their flesh with chips. Therefore, perhaps we should not have been shocked when Microsoft's researchers appeared to have hastened a desperate future. 

It was interestingly innocent and so very scientific. The headline of the researcher’s article read “Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.” 

What do you think this may possibly mean? Is there a newer, faster method for a machine to record spoken words? 

The abstract by the researchers got off to a good start. It employs several words, expressions, and acronyms that many layman's language models would find unfamiliar. 

It explains why VALL-E is the name of the neural codec language model. This name must be intended to soothe you. What could be terrifying about a technology that resembles the adorable little robot from a sentimental movie? 

Well, this perhaps: "VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt." 

The ChatGPT revolution: Microsoft Seems to Have Big Plans for This AI Chatbot 

The researchers often wanted to develop learning capabilities, while they have to settle for just waiting for them to show up. And what emerges from the researchers’ last sentence is quite surprising. 

Microsoft's big brains (AI, for an instance) can now create longer words and maybe lengthy speeches that were not actually said by you but sound remarkably like you with just three seconds of what one is saying. 

Through this, researchers wanted to shed light on how VALL-E utilizes an audio library assembled by Meta, one of the most reputable and recognized businesses in the world. It has a memory of 7,000 people conversing for 60,000 hours and is known as LibriLight. 

Also: Use AI-powered Personalization to Block Unwanted Calls And Texts 

This as well seems another level of sophistication. Taking the example of Peacock’s “The Capture,” in which deepfakes pose as a natural tool for the government. Perhaps, one should not really be worried since Microsoft is such a nice, inoffensive company these days. 

However, the idea that someone, anyone, can easily be conned into believing that a person is saying something he actually did not (perhaps, would never) itself is alarming. Especially when the researchers claim their capabilities to replicate the “emotions and acoustic behavior” of someone’s initial three-second speech as well. 

While this will be comforting when the researchers claim to have spotted this potential for distress. They offer: "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker." 

One may as well stress enough to find a solution to these issues. An answer to this, according to the researchers is ‘Building a detection system.’ But this also leaves a few individuals wondering: “Why must we do this, at all?” Well, quite often in technology, the answer remains “Because we can.”