We follow ethical norms & our process for objectivity.

AIMultiple's customers in chatbots include Zoho SalesIQ, CustomGPT.ai, Tidio.

The shift toward smaller Wu Dao models

Differences between Wu Dao 2.0 and 3.0

Capabilities

Wu Dao 3.0 vs. OpenAI GPT

What is the future of human-level thinking AI?

FAQs

Wu Dao 3.0 in 2025: China's Version of GPT

Cem Dilmegani

See our ethical norms

In July 2023, the Beijing Academy of Artificial Intelligence (BAAI) unveiled Wu Dao 3.0, the successor to their previous AI system. This new iteration takes a different approach, focusing on helping startups and smaller companies build their own AI applications without sacrificing performance.

The shift toward smaller Wu Dao models

China’s AI landscape has faced several challenges lately. Legal restrictions, high development costs, and international chip sanctions have made building massive AI models increasingly difficult. In response, researchers reimagined Wu Dao as a collection of smaller, more efficient models called Wu Dao Aquila.

This practical shift makes advanced AI more accessible to Chinese businesses. These compact models need fewer chips to run, reducing dependence on scarce hardware, a critical advantage given China’s current tech constraints.

The Chinese government has pivoted its AI strategy toward practical applications and open-source collaboration. Rather than pursuing isolated mega-projects, they’re now encouraging companies to share models, datasets, and computing resources to speed up innovation across the board.

Alibaba’s Qwen and DeepSeek models represent the most successful examples of this collaborative approach. BAAI, a nonprofit research organization, has embraced this philosophy by making Wu Dao Aquila open-source. Their goal resembles creating an AI ecosystem similar to Linux, providing a foundation that ensures long-term growth and accessibility for everyone.

Differences between Wu Dao 2.0 and 3.0

Wu Dao 2.0 has been trained on vast datasets, including 4.9 TB of high-quality text and images in English and Chinese.¹ It utilized a Mixture of Experts (MoE) system, FastMoE, which distributes tasks across specialized models to improve efficiency.² ³ By surpassing the state-of-the-art (SOTA) levels on 9 benchmarks, it is a good competitor for artificial general intelligence (AGI) and achieving human-level thinking.

Wu Dao 3.0 enhances its foundation by utilizing a more optimized architecture that combines smaller models as needed. It employs a sparse model approach that activates only a subset of parameters during inference. This strategy improves computational efficiency while maintaining high performance, making the model more adaptable for real-world applications.

Capabilities

Based on available information, the Wu Dao 3.0 ecosystem includes several specialized tools:

AquilaChat Dialogue Models: This includes a 7-billion parameter model that BAAI claims outperforms similar open-source alternatives both in China and internationally. There’s also a larger 33-billion parameter version. The smaller model supports both English and Chinese, with Chinese materials making up about 40% of its training data.

AquilaCode Model: This text-to-code generator (still under development) can create everything from simple programs like Fibonacci sequences to more complex applications such as sorting algorithms and games.

Wu Dao Vision Series: This collection tackles computer vision challenges with several specialized tools:

Multimodal Emu models
EVA, a billion-scale visual representation model
A general-purpose segmentation model
Painter, which pioneers “in-context” visual learning
EVA-CLIP, reportedly the best open-source CLIP model available
vid2vid-zero for zero-shot video editing

The EVA foundation model stands out for using publicly available data to develop large-scale visual representation. With one billion parameters, it has set new benchmarks in image recognition, video action recognition, object detection, and various segmentation tasks without requiring extensive supervised training.

BAAI has also enhanced the FlagOpen platform they launched in early 2023. This system offers parallel training techniques, faster inference, evaluation tools, and data processing utilities, essentially providing everything needed to develop large AI models. ⁴

When Wu Dao 2.0 first debuted at the Beijing Zhiyuan Conference, its creators displayed Chinese poems and drawings generated by it.⁵ Following that event, a virtual student was created based on Wu Dao’s AI model, Zhibing Hua. Wu Dao powers the virtual student. Therefore, she can use her knowledge base and learning capabilities to write poems, draw, and compose music.

Although these features are not highlighted for Wu Dao 3.0, they are worth mentioning if you plan to utilize Wu Dao 2.0 for your enterprise instead of Wu Dao 3.0.

Poems generated by Wu Dao 2.0⁶

Zero-Shot learning benchmarks

ImageNet: Achieves state-of-the-art zero-shot performance, surpassing OpenAI’s CLIP.
UC Merced Land-Use: Records the highest zero-shot accuracy in aerial land-use classification, outperforming CLIP.

Few-Shot learning benchmark

SuperGLUE (FewGLUE): Outperforms GPT-3, achieving the best few-shot learning results.

Knowledge and language understanding benchmarks

LAMA Knowledge Detection: Demonstrates superior factual knowledge retrieval, surpassing AutoPrompt.
LAMBADA Cloze Test: Exceeds Microsoft Turing-NLG in reading comprehension and context understanding.

Text-to-Image and Image-to-Text retrieval benchmarks

MS COCO (Text-to-Image generation): Outperforms OpenAI’s DALL·E in generating images from text descriptions.
MS COCO (English Image-Text retrieval): Surpasses OpenAI’s CLIP and Google ALIGN in retrieving images from captions (and vice versa).
MS COCO (Multilingual Image-Text retrieval): Outperforms UC2 and M3P in multilingual image-text retrieval.
Multi30K (Multilingual Image-Text retrieval): Also surpasses UC2 and M3P, confirming its strong multilingual multimodal capabilities.

Wu Dao 3.0 vs. OpenAI GPT

Here’s a comprehensive comparison between Wu Dao 3.0 LLM models and different OpenAI models according to BAAI.⁷ We cannot provide more detailed and up-to-date comparisons for Wu Dao since it doesn’t have recent and consistent benchmarks available.

Long context benchmark

Long context benchmarks measure a model’s ability to process extended or multi-step context. In this benchmark, 4 tasks( VCSUM (Chinese summarization), LSHT (Chinese long-sequence handling), HotpotQA (English multi-hop reasoning), and 2WikiMQA (English multi-document question answering))are evaluated under different training methods, and the total averages are given below.

Updated at 07-24-2025

Model	Average Score	Average Score (English Tasks)	Average Score (Chinese Tasks)
GPT-3.5-Turbo-16K	33.6	44.7	22.6
AquilaChat2-34B-16K (part of Wu Dao 3.0)	32.8	44.1	21.5
ChatGLM2-6B-32K	30.8	39.6	22.0
AquilaChat2-7B-16K (part of Wu Dao 3.0)	29.5	31.7	27.2
InternLM-7B-8K	22.4	30.6	14.3
ChatGLM2-6B	22.1	26.6	17.6
LongChat-7B-v1.5-32K	21.7	26.1	17.4
Baichuan2-7B-Chat	21.3	25.9	16.8
Internlm-20B-Chat	16.6	24.3	8.9
Qwen-14B-Chat	16.1	20.8	11.5
XGen-7B-8K	16.0	21.3	10.8
LLaMA2-7B-Chat-4K	14.0	18.0	10.0
Baichuan2-13B-Chat	10.5	14.8	6.3

Long context performances of LLMs⁸

Reasoning performance benchmark

Reasoning benchmarks measure how effectively a model can handle different reasoning types in textual contexts. In this evaluation, 6 tasks (bAbI #16 and CLUTRR (inductive reasoning), bAbI #15 and EntailmentBank (deductive reasoning), αNLI (abductive reasoning), and E-Care (causal reasoning)) are used to provide a view of the models’ logical capabilities and the total averages are given below.

Updated at 07-24-2025

Model	Average Score
Baichuan2-7B-Chat	47.8
Qwen-7B-Chat	49.5
Qwen-14B-Chat	51.1
Baichuan2-13B-Chat	53.3
InternLM-20B-Chat	53.9
ChatGPT	55.6
LLaMA-70B-Chat	57.2
GPT-4	81.1
AquilaChat2-34B (part of Wu Dao 3.0)	58.3
AquilaChat2-34B+SFT (part of Wu Dao 3.0)	65.6
AquilaChat2-34B+SFT+CoT (part of Wu Dao 3.0)	69.4

Reasoning task performances of LLMs⁹

If you want to use Wu Dao, you can set it up on your computer by downloading it for free.¹⁰

What is the future of human-level thinking AI?

As large language models like Wu Dao 3.0 evolve with impressive parameters, the path to artificial general intelligence (AGI) remains complex. AGI, or singularity, is AI’s capability of human-level thinking. Projects like Wu Dao’s large-scale aim to push boundaries, yet experts remain divided on timing.

Approximately 90% of AI experts predict an AI singularity by 2075, though some believe modeling the human brain at this level may be impossible.

To learn more about AGI predictions and expert opinions, read our in-depth analysis: Will AI reach singularity by 2060? 995 experts’ views on AGI.

FAQs

How does Wu Dao 3.0 differ from its predecessor, and what capabilities does it offer?

Unlike the massive Wu Dao 2.0, the 3.0 version consists of smaller, specialized models under the Aquila brand. These include AquilaChat for dialogue (available in 7B and 33B parameter versions), AquilaCode for text-to-code generation, and the Wudao Vision series for image captioning and visual tasks. These models are trained in Chinese and English text and aim to be more accessible and deployable for specific business applications. The Chinese government has supported this project as part of its strategy to compete in the AI realm dominated by Western companies.

What role does the FlagOpen system play in Wu Dao 3.0’s development?

FlagOpen serves as Wu Dao 3.0’s underlying infrastructure, providing crucial abilities for model development at scale. Launched in January by BAAI, it offers parallel training techniques, inference acceleration, and data processing tools specifically designed for large language models. BAAI aims for FlagOpen to become the “Linux of AI”, an open-source ecosystem that powers China’s next decade of AI innovation. This system gives developers the tools to work with models like Wen Yuan and Wen Su for text generation, poetry creation, and more complex tasks.

How does Wu Dao 3.0 reflect China’s AI strategy compared to models developed in the West?

Wu Dao 3.0 represents China’s strategic shift toward practical AI applications rather than just competing on model size. Reports suggest this approach was partly necessitated by chip sanctions and resource constraints, as predicted by industry CEOs. Instead of focusing solely on parameter count, the project emphasizes efficiency and specialization across multiple domains, including text, images, code, and protein analysis. This pragmatic approach allows Chinese companies to deploy AI solutions despite hardware limitations while advancing the country’s journey toward AI leadership.

Related research

WhatsApp HR: Top 25 Use Cases for Human Resources

Jul 246 min read

Top 25 Chatbot Case Studies & Success Stories in 2025

Jul 2412 min read

Wu Dao 3.0 in 2025: China's Version of GPT

The shift toward smaller Wu Dao models

Differences between Wu Dao 2.0 and 3.0

Capabilities

Zero-Shot learning benchmarks

Few-Shot learning benchmark

Knowledge and language understanding benchmarks

Text-to-Image and Image-to-Text retrieval benchmarks

Wu Dao 3.0 vs. OpenAI GPT

Long context benchmark

Reasoning performance benchmark

What is the future of human-level thinking AI?

FAQs

How does Wu Dao 3.0 differ from its predecessor, and what capabilities does it offer?

What role does the FlagOpen system play in Wu Dao 3.0’s development?

How does Wu Dao 3.0 reflect China’s AI strategy compared to models developed in the West?

Further Reading

External Links

Next to Read

Top Use Cases & Benefits of Emotional Chatbots in 2025

Top 5 Travel Chatbots with Use Cases & Examples in 2025

Top 5 SAP Conversational AI (Joule) Use Cases in 2025

Comments

Related research

WhatsApp HR: Top 25 Use Cases for Human Resources

Top 25 Chatbot Case Studies & Success Stories in 2025