New initiative aims to develop a large language AI model for Southeast Asia through research

0
436


Globe of Southeast Asia

NicoElNino/Getty Images

A new research initiative is underway to build a large language model (LLM) that better meets the demographics of Southeast Asian nations.

Dubbed the National Multimodal LLM Programme, the initiative is led by Singapore in a bid to develop an artificial intelligence (AI) large language model that supports the region’s diverse mix of culture and language.

Also: The ethics of generative AI: How we can harness this powerful technology

Three government agencies — Infocomm Media Development Authority (IMDA), AI Singapore (AISG), and the Agency for Science, Technology and Research (A*STAR) — have collaborated to launch the research program, with funds worth SG$70 million ($52.48 million) from the National Research Foundation.

“As technology evolves rapidly, there is a strategic need to develop sovereign capabilities in LLMs,” the agencies said in a joint statement. “Singapore and the region’s local and regional cultures, values, and norms differ from those of Western countries, where most large language models originate.”

They underscored the importance of developing multimodal and localized LLMs for Southeast Asia, including Singapore, that understand the context and values related to the region’s diverse cultures and languages. These variabilities can encompass, for example, context switching between languages in Singapore’s multilingual population.

Also: A thorny question: Who owns code, images, and narratives generated by AI?

The research initiative will tap high-performance computing resources of Singapore’s National Supercomputing Centre and look to develop the country’s research and engineering capabilities in multimodal LLMs.

“This national effort underscores Singapore’s commitment to become a global AI hub,” said Ong Chen Hui, IMDA’s assistant chief executive of biztech group. “Language is an essential enabler for collaboration. By investing in talent and investing in large language AI models for regional languages, we want to foster industry collaboration across borders and drive the next wave of AI innovation in Southeast Asia.”

The initiative will build on current efforts from AISG’s Southeast Asian Languages in One Network (SEA-LION), which is an open-source LLM that the government agency said is designed to be smaller, flexible, and faster compared to LLMs in the market today. SEA-LION currently runs on two base models: a three billion parameter model, and a seven billion parameter model.

Elaborating on the significance of the open-source model, AISG said: “Existing LLMs display strong bias in terms of cultural values, political beliefs, and social attitudes. This is due to the training data, especially those scraped from the internet, which often has disproportionately large WEIRD-based origins. WEIRD refers to Western, Educated, Industrialized, Rich, Democratic societies. People of non-WEIRD origin are less likely to be literate, to use the internet, and to have their output easily accessed.”

Also: 7 advanced ChatGPT prompt-writing tips you need to know

SEA-LION aims to establish LLMs that better represent “non-WEIRD” populations. Its training data comprise 981 billion language tokens, which AISG defines as fragments of words created from breaking down text during the tokenization process. These fragments include 623 billion English tokens, 128 billion Southeast Asia tokens, and 91 billion Chinese tokens.

Efforts to build localized LLMs are part of Singapore’s latest AI strategy, which seeks to drive its ambition to be a global development hub for AI solutions by 2030. These efforts include plans to triple the number of AI professionals in the country to 15,000 over the next three to five years and to provide an ecosystem that supports governance, testing, and benchmarking, alongside AI ethics and safety guidelines.

Noting that the world is heading into uncharted territory with recent developments in AI, Singapore’s Deputy Prime Minister Lawrence Wong said at the launch of the national AI strategy: “Up to now, AI has been mainly about pattern recognition. But in time to come, we will have AI systems with agency and with transactional abilities. We will have machines with human-like cognitive abilities and the capacity for self-awareness and independent decision-making.”

With the potential to significantly change human lives and impact societies, the responsible development and adoption of AI should be guided more deliberately, Wong said.