Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label Lip sync technology. Show all posts

Microsoft Announces Launch of Customizable AI-Powered Digital Avatar


Microsoft has recently announced the launch of its text-to-speech avatars on its Azure AI platform, which will enable businesses to design customized digital humans driven by natural language AI. With the new avatars, word input may be used to initiate interactive experiences and photorealistic avatar videos. Custom virtual assistants, instructional films, digital doubles, and other items can be created by users.

Introduced in the recently held Microsoft Ignite, these avatars incorporate text-to-speech models with computer vision advances. In order to train a personalized avatar, users can upload a few pictures and video samples of themselves. The AI technology matches the person's voice and likeness with exceptional fidelity by accurately mimicking lip motions and vocal intonations.

High-quality video content generally takes a lot of time and money. Microsoft, however, is making the bold assumption that its text-to-speech avatars will make it possible to create dynamic talking digital anchors, brand spokesmen, tutorial instructors, and other virtual characters more quickly and affordably.

Also, the avatars will provide a more natural conversation experience with AI assistants. Microsoft demonstrated an avatar shopping assistant that uses Azure AI search and database capabilities to handle inquiries and carry out transactions in real time.

Neural rendering and strong voice cloning are used by the avatar creator internally. The technology creates a phonetic sequence by first analyzing the supplied text. After that, Azure's cutting-edge neural text-to-speech engine synthesizes voice with remarkable accuracy by predicting acoustic features. Lastly, the avatar animation model creates photorealistic facial emotions and lip sync using those traits.

Prebuilt and customized text-to-speech avatars are the two variations of this capability that Azure AI Speech offers. With the prebuilt option, users may create standard video content and interactive applications with ready-made avatars in multiple languages and voices. On the other hand, the customized capability enables the production of unique avatars from customer-supplied video footage.

As of new, access to these customizable avatars is restricted in order to reduce the risks of deepfakes. However, Speech Studio tools make pre-built avatar alternatives available to all Azure customers.

Competitors of Microsoft

Certainly, Microsoft is not the only organization trying its hands on AI synthetic media, with them facing competition with D-ID and Synthesia which are already generating avatars on a large scale. 

Furthermore, although innovative in its own way, Microsoft's avatar technology is deficient in many of its features. For instance, unlike Microsoft's text-dependent method, D-ID and Synthesia enable avatar construction from both text and audio input. Additionally, D-ID offers user-friendly smartphone apps for creating avatars, while Synthesia's self-service studio facilitates large-scale video production.

Currently, the quality of lip-sync provided by Microsoft’s avatar is seemingly not at its best when compared with the realism achieved by Synthesia and D-ID.

Nonetheless, Microsoft has a competitive edge in speech AI research, large language model technologies, and enterprise cloud resources. With the support of Azure's processing capacity, the tech giant is wagering that its expertise in natural language systems will open up new use cases and advance avatar innovation.