AI News Bureau
The Ying text-to-video model accepts both text and image prompts to generate six-second video clips in around 30 seconds.
Written by: CDO Magazine Bureau
Updated 5:46 PM UTC, Tue July 30, 2024
Chinese AI start-up Zhipu AI unveiled its video generation model, dubbed Ying, on July 26. The Ying text-to-video model accepts both text and image prompts to generate six-second video clips in around 30 seconds.
Users can choose to fine-tune the results with style options that include 3D animation, cinematic or oil painting looks, as well as emotional themes such as tense, lively, and lonely.
The company announced at the launch event that the service would be immediately available to all users for unlimited use. However, the free version will have longer wait times during peak usage hours.
The technology that fuelled Ying, is a self-developed text-to-video model called CogVideoX, similar to the diffusion transformer (DiT) architecture used by OpenAI’s Sora, with improved inferencing speed that leads to faster video generation, said Zhipu chief executive Zhang Peng.
He added that the firm got inspiration from Sora’s algorithm design. Zhang also said Zhipu is working to launch a new iteration of the video model that will generate longer videos with higher definition.