Talks and presentations

Towards General Long-Horizon Agents: Challenges and Innovations

December 26, 2025

Large language models are driving a paradigm revolution in artificial intelligence. From ChatGPT to Artificial General Intelligence (AGI), LLM-based agents have emerged as a highly promising technical direction. This talk first introduces the fundamental concepts of large models and agents, including the definition of AI agents, the core capabilities of LLM-based agents (perception and understanding, reasoning and planning, memory storage and retrieval, generation and action), as well as practical applications such as AI research assistants (WisPaper) and automated task execution. The talk then focuses on three major scaling challenges in building general long-horizon agents: Scaling Environments, Scaling Goals, and Scaling Interactions. Finally, we present our explorations and innovations on AgentGym, a cross-environment self-evolution framework, and AgentGym-RL, a reinforcement learning approach for long-horizon decision making.

Turbocharging LLMs for Scientific Discovery

December 07, 2025

AI is revolutionizing scientific discovery, as highlighted by recent Nobel Prizes in Physics and Chemistry awarded to AI-related breakthroughs. This talk explores how large language models (LLMs) can turbocharge scientific research across the full discovery pipeline — from data analysis and hypothesis generation to experimental design. We begin with an overview of LLMs and their emergent abilities, then examine their growing capabilities in reading comprehension, coding, and complex reasoning. A key focus is on enhancing LLM reasoning for science, covering techniques such as Chain-of-Thought prompting, self-consistency, process supervision, and critique models. We further discuss agent-based modeling and simulation (ABMS) as a powerful paradigm for studying complex systems, and present recent advances in multimodal multi-agent systems for scientific tasks, including PhysicsMinions for physics olympiad problem solving and AtomAgents for AI-driven materials discovery.

Alignment Techniques for Large Language Models

May 09, 2025

Invited Talk, China Conference on Image and Graphics (CCIG 2025),

While large language models demonstrate remarkable capabilities, they also pose safety and ethical risks that necessitate alignment with human values. This talk provides a systematic overview of recent advances in LLM alignment techniques. We first motivate the need for alignment by examining safety and ethical challenges, along with the core alignment principles of helpfulness, honesty, and harmlessness. We then delve into human preference modeling, covering key issues such as reward model training, generalization, and online updating. Building on this foundation, we present RLHF-based alignment techniques including the PPO-MAX algorithm for stable training, Direct Preference Optimization (DPO), Linear Alignment for inference-time alignment, and multi-path feedback fusion methods. The talk also discusses post-alignment evaluation approaches for both safety/value alignment and capability alignment. Finally, we explore future directions including Self-Play multi-policy adversarial learning and reinforcement learning-centric reasoning models such as O1.

From Large Language Models to World Models

October 22, 2024

From weak AI to Artificial General Intelligence (AGI), large language models (LLMs) are driving landmark breakthroughs in AI. However, LLMs still face notable limitations, such as the inability to predict future events, handle extremely long reasoning chains, or access information beyond their training data. This talk explores the evolutionary path from LLMs to world models. We first discuss how large models construct world knowledge through language, and present our group’s research on large language models (MOSS), speech models (SpeechGPT), and multimodal models (AnyGPT). We then introduce the concept of world models — AI systems capable of simulating and understanding environments to make decisions and predictions — and discuss the integration of embodied intelligence with world models, including cutting-edge directions such as Video-Language-Action models. The talk also covers multimodal alignment techniques (LLaVA, ChatBridge) and multimodal expansion (AnyGPT), envisioning the future roadmap from LLMs toward embodied intelligence and world models.