Jack Ma-Backed Ant Group Innovates AI Training with Chinese-Made Semiconductors
Cost-Saving Techniques for AI Model Training
Ant Group, supported by Jack Ma, has leveraged Chinese-made semiconductors to create techniques for training artificial intelligence (AI) models, with the potential to reduce costs by up to 20%, according to sources familiar with the matter. The company utilized domestic chips, including those from affiliate Alibaba and Huawei Technologies, to implement a cutting-edge machine learning approach known as Mixture of Experts (MoE).
Competing with Global Giants
While Ant still utilizes Nvidia for certain AI development projects, the company is increasingly focusing on alternatives from Advanced Micro Devices Inc. (AMD) and local manufacturers. This strategic pivot positions Ant within a competitive landscape dominated by both Chinese and US firms, particularly as advancements in AI accelerate.
The MoE Machine Learning Approach
The MoE model allows for efficient training of AI, effectively dividing tasks into smaller sets of data. This specialization enhances overall efficiency, analogous to having a team of experts each tackling distinct aspects of a project. Ant’s implementation reportedly achieved results comparable to Nvidia’s powerful H800 processor, which is currently restricted from export to China.
Performance Benchmark Claims
This month, Ant Group published a research paper claiming that its models sometimes outperformed benchmarks set by Meta Platforms Inc. Although Bloomberg News has not independently verified this assertion, if accurate, it signifies a considerable advancement in Chinese AI capabilities. These developments could drastically cut costs associated with inferencing and support for AI services.
Broader Implications for AI Development
The rising investments in AI are leading to an increasing adoption of MoE models. This trend has gained traction due to successful implementations by major players like Google and the innovative startup DeepSeek. However, training MoE models often requires high-performance chips, such as those produced by Nvidia, which can be cost-prohibitive for smaller firms.
Ant’s Cost-Reduction Strategy
In its research, Ant reported that training 1 trillion tokens using high-performance hardware costs approximately 6.35 million yuan ($880,000). By utilizing a more optimized approach with lower-specification hardware, they aim to decrease this figure to about 5.1 million yuan. Tokens represent units of information the model processes to learn and provide responses effectively.
Recent Innovations in Large Language Models
Ant plans to harness breakthroughs from its large language models, Ling-Plus and Ling-Lite, for commercial applications in sectors like healthcare and finance. As part of this strategy, the company acquired the Chinese online platform Haodf.com to enhance its AI capabilities in these areas.
Performance of Ling Models Compared to Competitors
The Ling-Lite model reportedly excelled in specific benchmarks against Meta’s Llama models, while both Ling-Lite and Ling-Plus achieved superior performance compared to DeepSeek’s models on Chinese-language tasks.
The Insight from Industry Experts
Robin Yu, the chief technology officer of Shengshang Tech, emphasized the importance of practical applications, stating, “If you find one point of attack to beat the world’s best kung fu master, you can still say you beat them.” This highlights the significance of real-world effectiveness for AI models.
Open Source Approach
Ant has opted to make the Ling models open-source. The Ling-Lite model comprises 16.8 billion parameters, while Ling-Plus contains a significantly larger 290 billion parameters, making them competitive in the language model landscape. For context, ChatGPT’s GPT-4.5 is estimated to have 1.8 trillion parameters, and DeepSeek-R1 has 671 billion.
Overcoming Training Challenges
Despite the significant advancements, Ant faced challenges during the training process, particularly with stability. The company noted that minor alterations to the hardware or model structure could result in increased error rates within the models.
Conclusion
Ant Group’s innovative strides in AI training via Chinese-made semiconductors showcase the potential for major advancements in technology and efficiency. These efforts not only highlight the capabilities within China’s AI sector but also illustrate a commitment to overcoming external constraints, such as export controls on advanced chips.
Frequently Asked Questions (FAQs)
1. What is the Mixture of Experts (MoE) approach used by Ant Group?
The MoE approach involves dividing AI training tasks into smaller subsets, allowing different specialized components of the model to focus on distinct areas, thereby enhancing overall efficiency.
2. How much can Ant Group potentially reduce AI training costs?
Ant Group estimates that they can cut AI training costs by up to 20% compared to traditional methods using higher-performance chips.
3. What are the new large language models developed by Ant Group?
Ant Group has developed two large language models, Ling-Plus and Ling-Lite, aimed at improving performance in various applications, including healthcare and finance.
4. Why are Nvidia chips significant in AI training?
Nvidia chips, particularly high-performance models like the H800, are central to many AI training processes due to their processing power, but they are currently restricted from export to China.
5. What challenges did Ant Group encounter in training their models?
Ant faced stability issues during training, where small changes in hardware or model structure could lead to increased error rates in the models.