From Large Language Models to World Models

Date:

From weak AI to Artificial General Intelligence (AGI), large language models (LLMs) are driving landmark breakthroughs in AI. However, LLMs still face notable limitations, such as the inability to predict future events, handle extremely long reasoning chains, or access information beyond their training data. This talk explores the evolutionary path from LLMs to world models. We first discuss how large models construct world knowledge through language, and present our group’s research on large language models (MOSS), speech models (SpeechGPT), and multimodal models (AnyGPT). We then introduce the concept of world models — AI systems capable of simulating and understanding environments to make decisions and predictions — and discuss the integration of embodied intelligence with world models, including cutting-edge directions such as Video-Language-Action models. The talk also covers multimodal alignment techniques (LLaVA, ChatBridge) and multimodal expansion (AnyGPT), envisioning the future roadmap from LLMs toward embodied intelligence and world models.

Our Group's LLM Research: MOSS, SpeechGPT, AnyGPT

Slides (PDF)