Tencent, Tsinghua & HKUST Launch Video AI

March 15th witnessed the joint unveiling of a groundbreaking video AI model named “Follow-Your-Click” by Tencent, Tsinghua University, and the Hong Kong University of Science and Technology. Utilizing uploaded images, this model enables dynamic animation of initially static regions within the image with just a simple click and a few guiding words, transforming the images into videos instantly.

It is understood that technology which converts images to videos has broad prospects across various industries including film production, augmented reality, game development, and advertising, as part of AIGC applications, making it one of the hottest AI technologies of the year 2024. Research institutions point out that AI video generation is consistently breaking new ground, and the trend of AI+ is unstoppable in the future.

So, which areas stand to benefit?

The Follow-Your-Click Video AI Model

On March 15th, in a collaborative effort by Tencent, Tsinghua University, and the Hong Kong University of Science and Technology, a novel image-to-video model entitled “Follow-Your-Click” was launched. The user can bring to life static areas within an image by simply clicking the respective region and providing a few suggestive words, allowing for a seamless transition to video.

Current models for generating videos usually require users to describe movements and give detailed instructions, which can be a complex process. Moreover, existing technology often lacks control over the movement within a specified part of the image, leading to movement across the entire scene, rather than a targeted area, thus lacking precision and flexibility.

To address these issues, the collaborative team from Tencent’s Mixed Element model group, Tsinghua, and HKUST developed the practical and controllable “Follow-Your-Click” model for image-to-video generation, offering a more convenient interaction and making “a click to animate” a reality.

Tencent’s Mixed Element model group is known for its continuing research and exploration in multimodal technologies and boasts industry-leading capabilities in video generation. Previously, Tencent’s team supported the People’s Daily in creating the original video “Beautiful China,” generating stunning video segments of the country’s landscapes, showcasing strong content understanding, logical reasoning, and image generation abilities.

Tencent’s Mixed Element is a general-purpose language model developed in-house and first revealed publicly on September 7th last year. As part of Tencent’s fully in-house developed suite of models, Mixed Element is positioned for practical use, primarily focusing on business scenarios and industry applications. In May of the previous year, Ma Huateng, Chairman and CEO of Tencent, expressed that the emergence of large models represents a once-in-a-century opportunity akin to the Industrial Revolution. In terms of the Industrial Revolution, he highlighted the significance of solid foundational work in algorithms, computational power, and data, and importantly, the application in real-world scenarios.

As of December 2023, the practical general-purpose model has been internally tested in over 300 business and application scenarios within Tencent, including Tencent Meeting, Tencent Docs, WeChat Work, Tencent Ads, and WeChat Search, among others.

Emerging Technologies: Which Fields Will Reap the Most Benefits?

It is noteworthy that the field of AI video generation has recently seen several technological advancements globally.

February 15th marked a bombshell announcement from OpenAI in the realm of global AI-generated videos. On this day, the company released a model known as Sora, capable of “text-to-video” generation, and granted access to select researchers and creators. Sora not only generates videos from textual descriptions but can also create videos from existing images, with the currently producible video length being around one minute; video scenarios can include multiple characters, specific types of movement, and precise thematic and background details. Sora’s breakthroughs in clarity, coherence, understanding, consistency, and duration have set the internet ablaze, with related AI stocks being keenly speculated in the capital markets.

Following closely on the heels of Sora, on February 26th, Google’s DeepMind team released an 11-billion-parameter AI foundational world model known as Genie. With just a single image, Genie can create an interactive world with “controllable actions,” allowing users to perform step-by-step tasks within it. Google states that Genie ushers in the era of “image/text-to-interactive world” generation and could also serve as a catalyst for creating general AI agents.

Ping An Securities has observed that the release of multimodal large models like Sora and Genie by OpenAI and Google may accelerate the coming of the AGI wave.

Changjiang Securities points out that Genie defines a new paradigm for generative AI, reshaping the framework of generative interactive environments. 1) Genie could bring revolutionary changes to the video game industry: by converting a single image, photo, or sketch into a user-playable game, Genie can generate interactive, controllable environments. The game scenarios dynamically evolve with the player’s commands, creating new frames for each action and offering gamers a novel interactive experience, and possibly, in the future, the ability to generate a one-click playable game world. 2) The use of Genie in robotics opens pathways for developing general intelligence agents: Genie has mastered a set of uniform action patterns, learning how to operate various household objects through videos of real robot arms. It comprehends the movements made by the robot arm and how to control it, and these learned actions can be applied in the real world. The generality of Genie’s methods allows it to be extended to any type of field, enabling future AI agents to train in ever-expanding new worlds.

On February 28th, Alibaba’s intelligent computing research institute released a brand-new generative AI model named EMO (Emote Portrait Alive). With just a portrait photo and an audio clip, EMO can animate the person in the photo to “speak” or sing in sync with the audio content, exhibiting lifelike facial expressions and head postures. EMO introduces a new approach to multimodal video AI:

1) Unlike the text-driven video model Sora, EMO focuses on the direction of image + audio for video generation: with a single photo and any speed of human audio input, EMO can automatically generate sound portrait videos with rich facial expressions and head postures.

2) Superior performance compared to similar products like PIKA: Limited by its architecture, PIKA can only create 3-second lip-synching clips and is confined to lip movements synchronized with audio. In contrast, EMO can produce videos of corresponding duration to the audio and maintain consistency in character identity. EMO offers greater flexibility and produces videos that are more natural and emotionally expressive.

Changjiang Securities reports that the surge of new trends in AI video generation abroad has prompted a speedy catch-up by Chinese companies. The flourishing AI video sector introduces new plays, and the future trend of AI+ seems undeniable; investment opportunities in the commercial application of AI+ across IP, gaming, film, e-commerce, and advertising warrant attention. Guotai Junan Securities believes as AI video models mature, the entire workflow of film production is poised to benefit from AI iterations, with the film industry expecting to deeply gain from AI development.

Ping An Securities emphasizes that the global competition in the large model domain remains fierce, continually enhancing the overall capabilities of large models. The iterative upgrades of large model algorithms necessitate substantial computing power, and this is set to prop up the growth of the AI computing power market globally and in China. In addition, on the application front, the evolving capabilities of domestic large models in China present a broad outlook for the development of the AIGC industry. Haitong Securities comments that with the flourishing of the digital economy both domestically and internationally, the demand for AI and smart calculations is skyrocketing, which could also boost the demand for new chemical materials related to the industry.

Source: Securities China

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.