Baidu CEO Li Yanhong: Wenxin Yiyan Surpasses 200M Users; Launches 3 AI Tools

Baidu CEO Li Yanhong announces surpassing 200 million users and launching 3 AI tools

LeidiNet, April 16

Baidu’s founder, chairman, and CEO Li Yanhong delivered a keynote speech titled “Everyone is a Developer” today at the Create 2024 Baidu AI Developer Conference. He pointed out, “AI is sparking a creativity revolution, where developing applications in the future will be as simple as shooting a short video. Everyone is a developer, everyone is a creator.”

Li Yanhong stated that Baidu, as a technology company, aims to provide everyone with the necessary development tools to enhance social creativity. This includes a powerful series of basic models and three major AI development tools, forming a toolbox that developers can take away and use as needed.

The Wenxin Big Model series includes flagship versions such as ERNIE 3.5 and 4.0, as well as lightweight versions like ERNIE Speed, Lite, and Tiny. During the event, Li Yanhong also officially launched the tool version of the Wenxin Big Model 4.0.

Li Yanhong revealed that since the launch of Wenxin Yiyan over a year ago, the number of users has exceeded 200 million. Currently, the Wenxin Big Model has become China’s leading and most widely used AI basic model.

Li Yanhong emphasized that smaller-sized models derived from the powerful Wenxin 4.0 through dimensionality reduction and pruning exhibit significantly better performance at equivalent sizes compared to directly using open-source models, and they also come at significantly lower costs. “Therefore, open-source models will become increasingly outdated.”

Additionally, Baidu has prepared three major AI development tools for immediate use, including the intelligent agent development tool AgentBuilder, the AI-native application development tool AppBuilder, and various model customization tools ModelBuilder. “These three tools represent advanced productivity.”

During the conference, Li Yanhong shared Baidu’s specific approach to developing AI-native applications based on large models, highlighting MoE, small models, and intelligent agents as three directions worth attention. “This is based on Baidu’s practice over the past year, having faced numerous challenges and paid hefty tuition fees.”

Below is the transcript of Li Yanhong’s keynote speech “Everyone is a Developer”:

Li Yanhong delivering the keynote speech "Everyone is a Developer"

As long as you can talk, you can become a developer

Hello everyone, welcome to the Create 2024 Baidu AI Developer Conference, which is being held in the Greater Bay Area of Guangdong, Hong Kong, and Macau for the first time. Today, we have over 5,000 developers and technology enthusiasts here on-site. Over the past year, I’ve had discussions with many entrepreneurs and developers, and I sense that everyone is in a state of “FOMO,” which stands for Fear of Missing Out, feeling both excited and afraid of missing out. Indeed, large models and generative AI will completely transform the developer community.

In the past, developers changed the world with code; in the future, natural language will become the new universal programming language. As long as you can talk, you can become a developer and use your creativity to change the world.

Image: Li Yanhong delivering the keynote speech "Everyone is a Developer"

This day is not far away. We see that with powerful basic large models and many low-threshold, even zero-threshold development tools, the productivity of developers has greatly improved.

For example, Comate, an intelligent code assistant based on the Wenxin Big Model, not only supports over 100 languages and all mainstream IDE platforms, recommending code, generating code comments, finding code defects, providing optimization solutions, but also deeply interprets code libraries and generates new code related to private domain knowledge. After more than a year in operation, Comate has been adopted by over 10,000 companies such as Himalaya, Mitsubishi Elevator, and Softtek Power, with a code adoption rate of 46%. 27% of the new code added by Baidu every day is generated by Comate.

Today, you don’t need to write code to create an AI application; you don’t need to program to create an intelligent agent. AI is sparking a creativity revolution, where developing applications in the future will be as simple as shooting a short video. Everyone is a developer, everyone is a creator.

Image: Li Yanhong delivering the keynote speech "Everyone is a Developer"

As a technology company, Baidu’s role is to provide everyone with the necessary development tools to enhance the overall social creativity as much as possible. Specifically, we provide a powerful series of basic models, namely the Wenxin Big Model series, including flagship versions like ERNIE 3.5, ERNIE 4.0, as well as lightweight versions like ERNIE Speed, Lite, Tiny, and so on.
Baidu CEO Li Yanhong: Wenxin Yiyan surpasses 200m users, launches 3 AI tools

We also provide tools for developing various applications based on large models, including AgentBuilder for intelligent agent development, AppBuilder for native AI application development, and ModelBuilder for customizing models of various sizes. These three tools represent advanced productivity.

Wenxin Yiyan Users Exceed 200 Million, Wenxin Large Model 4.0 Tool Version Released

Baidu CEO Li Yanhong: Wenxin Yiyan surpasses 200m users, launches 3 AI tools

First, let’s talk about the latest developments of Wenxin Yiyan and Wenxin Large Model:

Wenxin Yiyan was launched on March 16th last year, and today marks one year and one month. Our user base has exceeded 200 million, with an API daily call volume surpassing 200 million. The number of serviced clients has reached 85,000, and the number of AI native applications developed using the Qianfan platform exceeds 190,000.

So, what are people doing with Wenxin Yiyan?

The real-life events depicted in the video are just the tip of the iceberg. We can see that Wenxin Yiyan is changing the work and lives of many more people.

The foundation model supporting Wenxin Yiyan is the Wenxin Large Model. Over the past year, it has evolved from version 3.0 to 3.5 and now to version 4.0. In terms of understanding, generation, logic, and memory capabilities, Wenxin 4.0 has reached a leading level in the industry.

In recent months, the Wenxin Large Model has made significant further improvements in general capabilities such as code generation, interpretation, and optimization, reaching an internationally leading level.

Today, we officially release the tool version of Wenxin Large Model 4.0. Now, you can experience the code interpreter function on the tool version, which allows for processing and analysis of complex data and files through natural language interaction. You can also generate charts or files to quickly gain insights into data characteristics, analyze trends, and provide efficient and precise support for subsequent decisions.

The Wenxin Large Model has become the most advanced and widely used AI base model in China.

Moreover, compared to a year ago, the algorithm training efficiency of the Wenxin Large Model has increased by 5.1 times, with a weekly training efficiency of 98.8%, and the inference performance has improved by 105 times, with the cost of inference reduced to 1% of the original.

In other words, customers who used to make 10,000 calls a day can now make 1 million calls with the same cost. While media may not get excited about a 99% cost reduction, for businesses and developers, once they start using it, the focus is on effectiveness and cost.

We are able to reduce the inference cost to 1% while improving performance precisely because Baidu has a full-stack layout in the four layers of chips, frameworks, models, and applications. Through end-to-end optimization, we continuously reduce costs, allowing more people to efficiently and affordably use large models for AI applications.

Undoubtedly, topics related to large models will remain hot in 2024, and various technological breakthroughs will continue to emerge. The media will continue to be enthusiastic about using titles like “shocking release” and “epic update” to dramatize the news. But what I want to emphasize is that large models themselves do not directly create value; AI applications developed based on large models can meet real market demands.

Sharing specific ideas and tools for developing AI native applications: “Stepped on countless pitfalls, paid high tuition fees”

Baidu CEO Li Yanhong: Wenxin Yiyan surpasses 200m users, launches 3 AI tools

Today, I want to share with you some specific ideas and tools for developing AI native applications based on large models. These are the lessons we at Baidu have learned over the past year, having stepped on countless pitfalls and paid high tuition fees for.

First is MoE. Future large-scale AI native applications are basically built on the MoE architecture. Here, MoE refers not to a general academic concept but to the mixed use of large and small models, not relying on a single model to solve all problems. However, when to call a small model, when to call a large model, and when not to call a model all require technical expertise tailored to different application scenarios.

Second is small models. Small models have low inference costs and fast response times. In some specific scenarios, after SFT fine-tuning, the usage effect of small models can rival that of large models. This is why we released Speed, Lite, and Tiny, three lightweight models. We compress and distill a basic model from large models, then train it with data. This approach yields much better results than training small models from scratch or training models based on open source models. It’s faster, more efficient, and lower in cost.

Third is intelligent agents. Intelligent agents are currently a hot topic, and as their capabilities improve, they will continuously spawn a large number of new applications. The mechanism of intelligent agents, including understanding, planning, reflection, and evolution, enables machines to think and act like humans, autonomously completing complex tasks, continuously learning in the environment, and achieving self-iteration and evolution. In some complex systems, we can also let different intelligent agents interact and collaborate to complete tasks with higher quality. We have developed these intelligent agent capabilities and fully opened them to developers.

In the directions of MoE, small models, and intelligent agents, Baidu has provided “ready-to-use” tools for everyone. Here’s an introduction to three different tools: AgentBuilder for intelligent agent development, AppBuilder for native AI application development, and ModelBuilder for customizing models of various sizes.

AgentBuilder:

Intelligent Agent Development Tool

Let’s start with AgentBuilder. Intelligent agents might be the closest and most mainstream way people interact with large models in the future. Leveraging powerful base models, intelligent agents can be mass-produced and applied in various scenarios.

Baidu recently upgraded its Wenxin intelligent agent platform. Currently, there are over 30,000 intelligent agents created, with over 50,000 developers and thousands of companies onboarded. Our goal is to enable everyone and every organization to become developers of intelligent agents, creating the most comprehensive intelligent agent ecosystem in China.

So, how do we achieve this goal? By providing everyone with a user-friendly intelligent agent development tool like AgentBuilder. Let’s take a look at how an intelligent agent for the “Singapore Tourism Board” is created as an example.

First, we open the Wenxin intelligent agent platform. The creation page offers two modes: zero-code and low-code. Beginners can simply choose the “zero-code mode” and create an intelligent agent using natural language with just a few sentences.

We start by naming the intelligent agent “Singapore Tourism Board” and then specify in the settings that it needs to develop travel plans, answer questions, and provide hotel and ticket booking services. These settings guide the intelligent agent and inform it about what it can do.

If only a basic intelligent agent is needed, the platform automatically fills in the details. However, we want the “Singapore Tourism Board” to be a professional intelligent agent, so we need to perform advanced configurations. I can add Singaporean encyclopedia entries and official website links to the knowledge base for daily updates. Then, I add some tools such as hotel search and attraction ticket purchasing to enhance its service capabilities. Currently, we have partnered with Ctrip to provide tourism services such as hotels, attractions, and ticketing. With these steps, an intelligent agent for the Singapore Tourism Board is ready for preview and fine-tuning.

Now, let’s open the Baidu app and search for “when is the least crowded time to visit Singapore” because everyone wants to avoid crowds when traveling. The intelligent agent synthesizes information from multiple sources and generates an answer, “least crowded from January to March.” We can also interact further with the intelligent agent, such as getting travel tips for Singapore, recommending the top three ranked hotels, or even directly booking tickets for Universal Studios Singapore, providing a one-stop solution that significantly saves users’ time.

In addition to Singapore, intelligent agents for cultural tourism in Dalian, Shenyang, and others are also available on the Wenxin intelligent agent platform, as well as various types of intelligent agents for knowledge, creation, learning, entertainment, etc., all created with AgentBuilder.

When Wenxin Yiyuan was first released last year, I mentioned that it would impact every company. Because of its powerful natural language understanding, expression, and reasoning capabilities, it brings every company closer to its customers.

Today, every business, every customer, can have their own exclusive intelligent agent on Baidu. The entire process requires no programming. By inputting information similar to prompt words and simply following a few steps, you can quickly generate an intelligent agent that works as a 24/7 online top-notch salesperson.

Now, let’s see how a business intelligent agent is created.

Qide Education is a well-known education company with over 60 branches nationwide and many overseas subsidiaries, covering a wide range of countries and requiring high standards of reception dialogue. How can they reply to customer inquiries 24/7, improve reception standards, and reduce operating costs?

Qide Education uses Baidu’s AgentBuilder to create an exclusive intelligent agent.

Let’s see how to create a basic-capacity intelligent agent. It’s simple: fill in the agent’s avatar, name, business scope, and welcome message on the platform, and set up some information that users need to provide, such as age and education level. In just 5 minutes, with zero barriers, an intelligent agent is ready.

Qide Education also wants this intelligent agent to be a knowledgeable and understanding study abroad consultant. It can provide professional analysis based on different situations of students, such as whether they want to go to the US or Australia, pursue a master’s or bachelor’s degree, and their scores in IELTS and TOEFL. We can create a more advanced intelligent agent by adding knowledge, roles, and tools.

In the knowledge module, upload private domain knowledge for the platform to analyze in real-time and generate conversation scripts automatically. In the role module, add some study abroad countries not within the business scope to the filtering scheme to improve the efficiency of user clues. In the tools module, include services such as scheduling appointments in-store. With these simple steps, a Qide Education intelligent agent with professional capabilities is ready.

Now, let’s search for “requirements for studying abroad in Australia,” and you’ll see the intelligent agent quickly providing the necessary language proficiency, major selection, and other seven essential requirements, as well as corresponding study abroad consultation plans, addressing various problems with answers and responses.

The Qide Education intelligent agent is highly welcomed. In the first week of its launch, it successfully distributed 1.55 million times, interacted with users 58,000 times, saw a linear growth in lead conversion, significantly reduced the cost of effective lead conversion, and greatly improved operational efficiency.

Next, let me introduce an intelligent agent in the home furnishing industry.

Sophia focuses on whole-house customization for home furnishing brands. Just like what we’ve seen, it can also create a basic business intelligent agent by simply filling in extremely simple information.
For the home furnishing industry, the offline experience of consumers is paramount. Therefore, Sophia hopes to create a premium sales experience online that replicates the hospitality of offline interactions.

In further development, Sophia opted for digital avatars in the role module to represent salespersons. They selected suitable backgrounds and voices for these avatars and, leveraging the platform’s intelligent parsing capability, automatically generated a set of sales scripts. The result is a polished sales persona, gentle, professional in dialogue, available 24/7 to meet users’ diverse needs and deliver high-quality service experiences.

When Baidu users search for renovation-related queries, Sophia’s intelligent system utilizes the capabilities of the Wenxin model to prioritize providing answers to their questions. Additionally, it actively engages with customers to confirm specific needs such as renovation type and budget, while recommending nearby offline stores.

Since Sophia’s business intelligence system went online, the cost of effective leads has decreased by 30%. In other words, acquiring one effective customer, which previously cost 100 yuan, now only requires 70 yuan.

Currently, over 10,000 Baidu users have access to Sophia’s business intelligence system, spanning more than 30 industries including education and training, real estate, machinery, and business services.

Through three demos, I’ve demonstrated how developers and businesses can utilize AgentBuilder to create intelligent systems tailored to different industries.

Nowadays, creating an intelligent system is just a matter of minutes. But here’s the problem! Without traffic, distribution, visibility, and user adoption, developers and businesses won’t profit, which means they lack motivation. How can we address this pain point?

Our Wenxin Intelligent System Platform provides developers with a pathway to monetize traffic. Apart from Baidu Search, other products within the Baidu ecosystem such as Xiaodu, Maps, Tieba, and in-car systems can integrate the capabilities of intelligent systems, alleviating concerns about traffic distribution for developers and ensuring tangible profits.

With distribution, comes data feedback; with data feedback, the flywheel turns, and intelligent systems can autonomously iterate, becoming smarter with each use. The Wenxin Intelligent System Platform has also launched modules for data analysis and question-answering optimization for intelligent systems, with more capabilities set to be introduced soon. Through the distribution-diagnosis-profit data flywheel, the Wenxin Intelligent System Platform drives a positive loop of higher quality, better traffic, and greater revenue for intelligent systems.

AppBuilder: Develop an Application in Three Steps with Natural Language

Next, let me introduce the second development tool, AppBuilder. It’s currently the most user-friendly AI native application development tool. AppBuilder comes pre-packaged with various components and frameworks necessary for developing AI native applications, significantly reducing the barriers to entry for developers.

In just three steps, developers can use natural language to develop an AI native application and seamlessly publish and integrate it into various business environments. Let’s look at a few examples:

Earlier this year, we held an AI native application development challenge. The task was to use AppBuilder to create a “Theme Park Queue Planning Assistant” to help visitors better understand queue times and design personalized routes for optimal enjoyment within a limited time.

The champion of this challenge developed the application without writing a single line of code and won a prize of 100,000 yuan provided by Baidu. While the task might not be difficult for someone proficient in coding, developing it without any code relies heavily on foundational models and the capabilities of AppBuilder.

Now, let’s see how to use AppBuilder to create this AI application.

First, let’s review the challenge. It assumed queue times and excitement levels for various attractions at Universal Studios. The objective was to maximize excitement within a limited time.

Firstly, open the development interface of AppBuilder and name the application “Theme Park Queue Assistant.” Secondly, describe specific requirements in the role commands, including calling the code interpreter, calculating the optimal combination within a fixed time, and outputting the result. Thirdly, add the code interpreter from the tool components to aid in calculations.

Now, let’s test the effect. Inputting the question “I have 3 and a half hours, how can I have the most thrilling experience?” on the right side, you can see the code interpreter translates the question into code, which is then analyzed using data comprehension tools. After a series of calculations, it suggests that the optimal experience includes rides like “Harry Potter Forbidden Journey,” “Jurassic Adventure,” “King Kong Roller Coaster,” and “Bumblebee Spin.” With no issues found during testing, click “publish,” and voila! An application is generated without a single line of code.

AppBuilder has further upgraded. During the creation process, all steps can now be optimized automatically through the “AI Optimization Configuration” feature, further enhancing development efficiency.

Let’s see another example.

Earlier this year, North China Electric Power University proposed providing intelligent personalized services for all students and faculty. Using Baidu’s AppBuilder, we jointly created a Huadian AI Assistant. Now, let me show you how we did it:

First step: Open AppBuilder, enter the creation page, and click on “AI Auto-Generate Configuration.” First, set up basic information such as application name, description, and avatar.

Second step: Describe specific requirements in the role commands using natural language, including tasks, component capabilities, requirements, and limitations.

Third step: Insert custom components like library book query, course schedule query, and student grade query to equip the campus assistant with intelligent service capabilities. Then, add an opening statement for the campus assistant, and the application is configured.

Next, in the preview interface, debug the assistant based on user inquiries. For example, ask about the registration time for the CET-4 and CET-6 exams to test the automatic invocation of various components.

As you can see, with these simple operations, the application is ready.
## Baidu’s Cutting-edge Technologies in Multimodal Capabilities

Currently in a small-scale testing phase, our platform has integrated features such as checking regulations, courses, topping up meal cards, and borrowing books – which serve a vast number of students and faculty in the school community. In the future, we will further deepen our collaboration with North China Electric Power University to provide even more diverse and convenient services.

Baidu also has years of technical expertise in the field of multimodal interaction. Through our AppBuilder platform, we offer cross-modal capabilities where you can easily create drawing-based applications like cartoons and children’s drawing books by simply providing a text passage or a few sentences.

The process is straightforward: Open AppBuilder, navigate to “App Creation,” input the role command, select the “Add text-to-image” component, then input three recommended questions before clicking on the publish button. Once the application is ready, simply provide a rough idea of the desired character or plot, and AppBuilder will automatically generate the story and output drawings.

The latest feature of intelligent comics and drawing book creation on Baidu Wenku leverages the components available on AppBuilder. Let’s delve into how Baidu Wenku’s comic creation feature empowers individuals with creative ideas to produce quality work.

Let’s take the classic tale of “Zhou Chu Eliminating Three Evils” as an example.

Accessing Baidu Wenku, input the theme “Zhou Chu Eliminating Three Evils,” a story found in “Book of Jin” and “A New Account of the Tales of the World”. After searching in Wenku, the platform will generate a story based on the original text, which users can further edit. Clicking on the AI toolbar on the right side allows you to start creating the comic adaptation of this story.

Upon entering the comic creation interface, Wenku automatically generates comic panels based on the story plot. From various styles such as light and shadow, realistic, and cartoon, select the comic style that best fits the story. Finally, choose different character images according to the roles, thus completing the comic creation process.

After generating the comic, users can browse the entire comic in Baidu Wenku’s intelligent comic editor. Additionally, Wenku supports editing, modifying, and fine-tuning each frame. For instance, by selecting the frame “Zhou Chu and the White-Browed Tiger” on the left side, clicking edit, and adding a description like “characters with clear features and bright scenes,” you can refine the comic to better suit your needs. The comic function on Baidu Wenku excels in maintaining consistency in character and scene styles.

Wenku’s intelligent comic feature significantly enhances the efficiency of comic creation, reducing the costs and barriers to entry for aspiring comic creators, enabling more individuals with ideas and creativity to realize their comic creation dreams.

Beyond generating comics, Baidu Wenku now also facilitates effortless creation of drawing books for users. It might surprise you to learn that the average number of picture books read per child in China is only 10 per year, compared to about 50 in Western countries. With AI, even parents lacking drawing skills can now create personalized children’s drawing books for their kids. Let’s now explore these vividly illustrated sound drawing books!

Since last year, we’ve transformed Baidu Wenku with AI into the starting point for users’ content production. Now, with the support of AppBuilder, the newly introduced intelligent comic and drawing book features further expand the scope to a more engaging multimodal creative realm.

I’ve just presented three examples of creating AI-native applications using Baidu’s AppBuilder. You should be able to notice two clear advantages of AppBuilder:

  1. Powerful Functionality: Leveraging the advanced capabilities of Wenxin 4.0 in understanding and following instructions, our AppBuilder ensures a high level of performance right from the start, avoiding the need for extensive tuning efforts, thus significantly lowering the development barriers. With enhancements in search technologies like RAG, in typical scenarios such as knowledge-based queries, our response accuracy and user-friendly interaction rates have exceeded 95%, surpassing other similar products by a large margin. AppBuilder offers a comprehensive set of components, including AI capabilities components based on Baidu’s extensive technical experience, large-scale model components, as well as exclusive business components – totaling 55 components. Additionally, it incorporates third-party APIs for mainstream scenarios such as flight inquiries and academic paper searches. We have just introduced support for custom components, allowing users to directly integrate their proprietary tools and data. These diverse components collectively support the efficient development of AI-native applications.

  2. Simplicity and Ease of Use: With AppBuilder, you can create and distribute applications quickly in just three easy steps. We also support open-source SDKs for convenient customization.

ModelBuilder: Efficient and Cost-effective Model Production Tool

Let me introduce a tool more suitable for professional developers, which is ModelBuilder. It enables developers to customize models of any size based on their requirements and further fine-tune these models for specific contexts, thereby achieving better results. For developers, mastering model fine-tuning is crucial when working with large models.

Let’s take the example of essay correction in the education industry. As essay grading follows specific standards, and different grade levels have varying requirements and grading criteria, model fine-tuning is essential to tailor the model’s output to meet specific needs. Now, let’s see how ModelBuilder accomplishes model fine-tuning for essay corrections.

  1. Creating the Dataset: The effectiveness of model fine-tuning heavily relies on the quality of our data. In this case, the original dataset consisted of only 180 entries, which was insufficient in quality. To address this, we used three functions: data cleaning, data annotation, and data augmentation. Data cleaning quickly eliminates issues like missing data and garbled characters. For data annotation, we added dimensions to essays such as content depth and writing techniques for evaluation. Data augmentation can generate similar but non-repetitive data for expansion. After augmentation, ModelBuilder provided us with 920 high-quality data entries.

  2. Model Fine-tuning: Now, we move on to the model fine-tuning stage. Firstly, select a base model for fine-tuning, with ERNIE Speed chosen as our base model in this case.

Fine-Tuning AI Models for Localization

First, we configure the parameters based on the platform’s recommended values. This time, we have nearly 1000 pieces of data. According to the recommended number of iterations, we set it to 10 rounds, and then the model can start training.

Step Three: Deploying the Model on the Platform

Now, the entire fine-tuning process is complete.

Next, let’s test the effectiveness of the fine-tuned model using a high school essay titled “Looking Forward to Growing Up”. The theme is open-ended, with a minimum of 800 words.

We input a student’s essay and the formatting requirements for correction into two models before and after fine-tuning. Let’s see the comparison between the two models and the teacher’s feedback on both. It’s evident that the fine-tuned large model now possesses a more professional approach to teacher feedback and adheres better to formatting guidelines. Tasks that would have taken the teacher a considerable amount of time to complete can now be swiftly handled within seconds using the large model, significantly enhancing the speed and efficiency of providing feedback to students. Currently, such fine-tuned models have been implemented in several leading companies in the education sector.

Earlier, I mentioned that large-scale AI native applications are generally MoE. So, how do we combine these models to achieve the best cost-effectiveness? Below, I’ll demonstrate a case of MoE model routing using Xiaodu. Let’s first see if Xiaodu has become smarter with the support of the large model:

Conversation:

Robin: Xiaodu, Xiaodu.

Xiaodu: Here.

Robin: Any news today?

Xiaodu: Xiaodu plays 3-5 pieces of daily news headlines.

Robin: Xiaodu, I heard you’ve improved lately?

Xiaodu: I can not only chat with you but also serve as your AI scheduling assistant.

Robin: Then let me test you. Help me arrange my schedule. After today’s activities, I’ll have dinner with Mr. Zhang. Following that, there’s a management meeting at the hotel, which will probably last for about an hour and a half. I hope it can end before 10 p.m. Tomorrow morning, I plan to go for a morning run, check the weather, and then there’s a meeting at 8 a.m. Please arrange my schedule.

Xiaodu: According to your requirements, Xiaodu has devised the optimal schedule for you:

  • You will have dinner with Mr. Zhang from 6:00 p.m. to 7:30 p.m., followed by the management meeting at 8:00 p.m., expected to end around 9:30 p.m.
  • Tomorrow morning, your alarm will remind you to wake up at 6:00 a.m. for your morning run. Keep the running time within an hour to ensure you won’t be late for the 8:00 a.m. meeting.
  • The weather tomorrow morning will be clear with a temperature of 25 degrees Celsius. It is recommended to wear short sleeves and apply sunscreen when going out.
  • Do you think I can be your secretary? I have more new skills to offer.

Robin: Pretty good, thumbs up to you, Xiaodu. Please sync these schedules to my phone.

In the conversation above, I asked Xiaodu several questions, ranging from simple news inquiries to complex scheduling needs. It seems like I’m interacting with the same Xiaodu, but behind the scenes, different models are being invoked. After receiving my queries, the routing work is first handled by the small model ERNIE Tiny. For weather inquiries, it uses the fine-tuned model based on ERNIE Lite, which has undergone SFT. It provides clothing advice based on the weather forecast. For more complex schedule arrangements, it calls upon the high-performance large model, WENXIN 4.0, to calculate the arrangements from tonight until tomorrow morning.

Today, each question posed to Xiaodu is assigned to different models for execution. When calling the application’s API interface, ERNIE Functions model is used. For teaching topics to children, either WENXIN large model 3.5 or 4.0 is employed. The development of smart assistants uses the ERNIE Character model to enhance consistency and stimulate user engagement in conversations.

Through this combination of large and small models, Xiaodu has not only smoothly undergone a “brain replacement” operation, adopting the new AI native operating system DuerOS X but has also achieved the optimal combination of effectiveness, speed, and cost. Compared to using the flagship version of WENXIN large model exclusively, this approach has doubled the response speed while reducing costs by 99%. The Xiaodu AI tablet robot I just conversed with is now available on major platforms, and interested friends can place orders immediately for an experience.

These examples about ModelBuilder showcase Baidu’s ability to efficiently and cost-effectively produce models. With the powerful base model, WENXIN 4.0, we can tailor models of smaller sizes according to various considerations such as effectiveness, response speed, and inference cost, supporting fine-tuning and post-pretraining. Models trimmed in this way outperform those directly obtained from open source, both in terms of effect at equivalent sizes and cost at equivalent effects. Previously, people thought open source was cheaper, but in the context of large models, open source is the most expensive. Therefore, open source models will become increasingly outdated.

To facilitate quick start-up, ModelBuilder comes with the most comprehensive and rich set of large models. This includes ERNIE 3.5 and ERNIE 4.0, the flagship large models suitable for general complex scenarios with powerful capabilities. There are also three lightweight large models: ERNIE Speed, Lite, and Tiny. Additionally, there are two vertical-scenario models: ERNIE Character for role-playing and ERNIE Functions for dialogue or question-and-answer scenarios. Of course, ModelBuilder also supports third-party mainstream models from both domestic and international sources, with a total of 77 models, making it the platform with the largest number of large models in China.

The 2nd “WENXIN Cup” Entrepreneurship Competition Officially Begins: Offering a Special Prize of 50 Million Yuan

For developers, in addition to providing these development tools, we also offer support in terms of funding and resources.

In May last year, Baidu launched the “WENXIN Cup” Entrepreneurship Competition, aiming to promote the prosperity of the large model ecosystem, create a more vibrant ecosystem, and assist entrepreneurs and developers in developing various AI native applications.
# The Second “Wenxin Cup” Entrepreneurship Competition

The first “Wenxin Cup” Entrepreneurship Competition received nearly 1000 applications from entrepreneurial teams. Baidu selected 15 winning teams and provided nearly a billion yuan in investment support, along with comprehensive support in technology, team building, and resources.

Today, I am pleased to announce the official launch of the second “Wenxin Cup” Entrepreneurship Competition. This time, we are expanding the scope of project selection and setting up multiple competition venues, targeting global markets and university students. We are recruiting entrepreneurial innovation teams, and as long as your entrepreneurial focus is on AI native applications, you can sign up to participate on the competition’s official website. Additionally, we have increased support for entrepreneurs, providing more abundant investment funds, richer business resources, and for the first time, we have established a “Special Prize.” Exceptional projects will have the opportunity to receive up to 50 million yuan in cash and resource support.

Chinese entrepreneurs and developers are adept at leveraging new technologies to develop applications. I believe that the Wenxin large-scale model will become the preferred choice for Chinese AI entrepreneurs and developers, with an increasing number of applications being built on top of the Wenxin large-scale model. I also look forward to more entrepreneurs and developers joining us in building a thriving AI ecosystem.

The biggest application scenario for visual large-scale models is autonomous driving

Most of the tools and cases we’ve discussed earlier are based on large language models. Looking ahead, I believe that multimodal large-scale models, or the integration of text, images, speech, videos, and other modalities, are crucial long-term development directions for foundational models and are the inevitable path towards AGI (Artificial General Intelligence). Baidu has made long-term investments in these areas and will promptly update technological advancements.

I have a very different judgment: the biggest application scenario for visual large-scale models is autonomous driving. Baidu excels in this direction and is a global leader in autonomous driving. We not only train AI to generate videos but also to understand real-world events and predict the future.

Based on data from over 100 million kilometers of testing on complex urban roads in China, Baidu has trained the Apollo visual perception large-scale model. It possesses four basic capabilities: detection, tracking, understanding, and mapping. This enables Baidu to have a more intelligent, adaptable, and safer autonomous driving solution.

Baidu Maps has also been the first to apply visual perception large-scale models to the field of mapping. Now, the world’s largest-scale lane-level map data has been launched in 360 cities nationwide. Wherever Baidu Maps navigation can go, autonomous driving can operate.

After this year’s Spring Festival, Baidu’s RoboRun achieved the “first cross” of the Yangtze River. We extended our services from the north bank of the Yangtze to the south bank. In certain areas of Wuhan, we have already achieved 24/7 operations, and we plan to deploy 1000 unmanned vehicles in Wuhan by the end of the year.

This is a landmark event marking the true commercialization of autonomous driving. It is no longer just a regional demonstration but has entered a new stage of urban-level application demonstrations. RoboRun covers more than 3000 square kilometers and serves a population of 7.7 million in Wuhan, making it the largest-scale autonomous driving operating area globally.

“Take it away, use it as you wish”

Just now, I presented the Baidu Wenxin large-scale model series and three development tools—AgentBuilder, AppBuilder, ModelBuilder. They form a toolbox that you can take away and use instantly.

At this moment, I stand here as a developer, an entrepreneur, just as excited as everyone else. Today, China has 1 billion Internet users, powerful foundational large-scale models, numerous AI application scenarios, the most comprehensive industrial system globally, and the country is vigorously encouraging and supporting the “AI+” action. Every individual, every company, only needs to fully utilize these tools to unleash unlimited creativity and productivity.

Today, everyone can become a developer. And in the future, it will undoubtedly be a future created by developers together!


Leidi Media, founded by media person Lei Jianping. If reproduced, please indicate the source.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.