When the US cut off China’s access to advanced chips, the Beijing Academy of Artificial Intelligence faced a choice: complain about restrictions or work around them. They picked the second option.
Wu Dao 3.0, launched in July 2023, throws out the playbook. No massive trillion-parameter models competing for headlines. Instead, BAAI now builds compact models that Chinese startups can actually run without needing a warehouse full of GPUs.
Why BAAI Changed Direction
Wu Dao 2.0 made headlines in 2021 with 1.75 trillion parameters, claiming to rival GPT-3. Two years later, BAAI quietly shelved that approach.
The reasons:
- US chip sanctions limited access to advanced GPUs
- Training costs for mega-models became prohibitive
- Chinese government policy shifted toward practical applications over prestige projects
- Market reality showed most companies need specialized tools, not general-purpose giants
The new strategy: build a collection of smaller models (called Aquila) that work together. Think microservices instead of monoliths.
What Wu Dao 3.0 Actually Is
Wu Dao 3.0 isn’t a single model. It’s an ecosystem of specialized AI tools released under the Aquila brand:
AquilaChat: Dialogue Models
Two sizes available:
- 7 billion parameters: Competes with LLaMA 7B and similar open-source models
- 33 billion parameters: Targets more complex conversations
Both trained on Chinese (40%) and English (60%) text. The smaller version runs on consumer hardware—you don’t need a data center.
BAAI claims AquilaChat 7B outperforms comparable international models, though independent benchmarks remain limited.
The Origins: How Wu Dao Started
Development began in October 2020, several months after GPT-3’s release. The name Wu Dao (悟道) translates to “road to awareness” in Chinese, an ambitious name for an ambitious project.
Wu Dao 1.0 launched on January 11, 2021, with four specialized models working together. Each handled different tasks: Wen Yuan (2.6 billion parameters) focused on question-answering and grammar correction. Wen Lan (1 billion parameters) generated image captions using 50 million image pairs. Wen Hui (11.3 billion parameters) wrote poetry, created videos, and handled complex reasoning. Wen Su, built on Google’s BERT, predicted protein structures similar to AlphaFold.
Then came Wu Dao 2.0 on May 31, 2021. BAAI made headlines, claiming 1.75 trillion parameters, ten times larger than GPT-3’s 175 billion. Media called it “the biggest language AI system yet.” Commentators saw it as China’s attempt to compete directly with American AI dominance.
The Training Data Reality
Wu Dao 2.0 used 4.9 terabytes of images and text, 1.2 TB of Chinese, 1.2 TB of English, plus image data. GPT-3 was trained on 45 terabytes of text alone. Wu Dao had ten times the parameters but less than a tenth of the training data.
The WuDao Corpora dataset for version 2.0 contained 3 TB of web text, 90 TB of graphical data (630 million text/image pairs), and 181 GB of Chinese dialogue representing 1.4 billion conversation rounds.
This mismatch between parameter count and training data hinted at something important: Wu Dao 2.0 used a different architecture called Mixture-of-Experts (MoE). Unlike GPT-3’s “dense” model where all parameters activate for every task, MoE models activate only relevant experts for each input. This requires much less computational power to train, but research has shown that trillion-parameter MoE models perform comparably to dense models hundreds of times smaller.
Wu Dao 2.0 specifically used FastMoE, Google’s MoE variant. It was clever engineering around hardware limitations, though BAAI’s marketing emphasized raw parameter counts instead.
AquilaCode: Text-to-Code Generation
Still in development. Early versions can generate:
- Basic algorithms (Fibonacci sequences, sorting)
- Simple games
- Utility scripts
Not yet at the level of GitHub Copilot or GPT-4’s coding abilities, but improving. BAAI targets developers who need code generation in Chinese technical contexts.
Wu Dao Vision Series
A collection of computer vision models, not a single system:
EVA (1 billion parameters): Focuses on visual representation learning. Trained on public datasets, achieving new benchmarks in:
- Image recognition
- Video action detection
- Object detection
- Segmentation tasks
Open source, unlike competitors that keep vision models proprietary.
EVA-CLIP: BAAI claims this is the best open-source CLIP alternative available. Handles image-text matching for search and retrieval.
Painter: Implements “in-context” visual learning—show it examples, and it learns new visual tasks without retraining. Similar to how GPT-3 does in-context learning for text.
vid2vid-zero: Zero-shot video editing tool. Edit videos based on text descriptions without training on specialized video-editing datasets.
Emu (multimodal models): Handles both images and text in a single model. Use cases include image captioning, visual question answering, and content generation.
FlagOpen: The Infrastructure Layer
BAAI has also enhanced the FlagOpen platform they launched in early 2023. This system offers parallel training techniques, faster inference, evaluation tools, and data processing utilities, essentially providing everything needed to develop large AI models. 1
When Wu Dao 2.0 first debuted at the Beijing Zhiyuan Conference, its creators displayed Chinese poems and drawings generated by it.2 Following that event, a virtual student was created based on Wu Dao’s AI model, Zhibing Hua. Wu Dao powers the virtual student. Therefore, she can use her knowledge base and learning capabilities to write poems, draw, and compose music.
Although these features are not highlighted for Wu Dao 3.0, they are worth mentioning if you plan to utilize Wu Dao 2.0 for your enterprise instead of Wu Dao 3.0.
Poems generated by Wu Dao 2.03
Zero-Shot learning benchmarks
- ImageNet: Achieves state-of-the-art zero-shot performance, surpassing OpenAI’s CLIP.
- UC Merced Land-Use: Records the highest zero-shot accuracy in aerial land-use classification, outperforming CLIP.
Few-Shot learning benchmark
- SuperGLUE (FewGLUE): Outperforms GPT-3, achieving the best few-shot learning results.
Knowledge and language understanding benchmarks
- LAMA Knowledge Detection: Demonstrates superior factual knowledge retrieval, surpassing AutoPrompt.
- LAMBADA Cloze Test: Exceeds Microsoft Turing-NLG in reading comprehension and context understanding.
Text-to-Image and Image-to-Text retrieval benchmarks
- MS COCO (Text-to-Image generation): Outperforms OpenAI’s DALL·E in generating images from text descriptions.
- MS COCO (English Image-Text retrieval): Surpasses OpenAI’s CLIP and Google ALIGN in retrieving images from captions (and vice versa).
- MS COCO (Multilingual Image-Text retrieval): Outperforms UC2 and M3P in multilingual image-text retrieval.
- Multi30K (Multilingual Image-Text retrieval): Also surpasses UC2 and M3P, confirming its strong multilingual multimodal capabilities.
Wu Dao 3.0 vs. OpenAI GPT
Here’s a comprehensive comparison of Wu Dao 3.0 LLM models and various OpenAI models based on BAAI.4 We cannot provide more detailed and up-to-date comparisons for Wu Dao since it doesn’t have recent and consistent benchmarks available.
Long Context Performance
Testing across four tasks:
- VCSUM (Chinese summarization)
- LSHT (Chinese long-sequence handling)
- HotpotQA (English multi-hop reasoning)
- 2WikiMQA (English multi-document QA)
Long context performances of LLMs5
Reasoning performance benchmark
Testing across 6 tasks:
- bAbI #16 and CLUTRR (inductive reasoning)
- bAbI #15 and EntailmentBank (deductive reasoning)
- αNLI (abductive reasoning)
- E-Care (causal reasoning)
Reasoning task performances of LLMs6
If you want to use Wu Dao, you can set it up on your computer by downloading it for free.7
FAQs
Further Reading
- Top 5 Natural Language Platforms (NLP) Comparison
- 45 Statistics, Facts & Forecasts on Machine Learning
- 100+ AI Use Cases & Applications: In-Depth Guide
Reference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.