Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016
We propose three different mechanisms of sharing information to model text with task-specific and shared layers.
Recommended citation: Pengfei Liu, Xipeng Qiu, Xuanjing Huang: Recurrent Neural Network for Text Classification with Multi-Task Learning. IJCAI 2016: 2873-2879 http://xuanjing-huang.github.io/files/RNN.pdf
Published in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
The paper proposed an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other.
Recommended citation: Pengfei Liu, Xipeng Qiu, Xuanjing Huang: Adversarial Multi-task Learning for Text Classification. ACL (1) 2017: 1-10 http://xuanjing-huang.github.io/files/AMT.pdf
Published in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria.
Recommended citation: Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang: Adversarial Multi-Criteria Learning for Chinese Word Segmentation. ACL (1) 2017: 1193-1203 http://xuanjing-huang.github.io/files/cws.pdf
Published in The Eighteenth China National Conference on Computational Linguistics, 2019
In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task.
Recommended citation: Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang: How to Fine-Tune BERT for Text Classification? CCL 2019: 194-206 http://xuanjing-huang.github.io/files/bert-ft.pdf
Published in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans.
Recommended citation: Xiaonan Li, Hang Yan, Xipeng Qiu, Xuanjing Huang: FLAT: Chinese NER Using Flat-Lattice Transformer. ACL 2020: 6836-6842 http://xuanjing-huang.github.io/files/FLAT.pdf
Published in SCIENCE CHINA Technological Sciences (SCTS), 2020
In this survey, we provide a comprehensive review of PTMs for NLP.
Recommended citation: Xipeng Qiu, TianXiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang, Pre-trained Models for Natural Language Processing: A Survey, SCIENCE CHINA Technological Sciences (SCTS) , 2020, Vol. 63(10), pp. 1872–1897 http://xuanjing-huang.github.io/files/PTM.pdf
Published in Findings of the Association for Computational Linguistics: ACL-IJCNLP, 2021
The paper proposes a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
Recommended citation: Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, Ming Zhou: K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. ACL/IJCNLP (Findings) 2021: 1405-1418 http://xuanjing-huang.github.io/files/K-Adapter.pdf
Published in CoRR abs/2307.04964, 2023
We dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.
Recommended citation: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang: Secrets of RLHF in Large Language Models Part I: PPO. CoRR abs/2307.04964 (2023) http://xuanjing-huang.github.io/files/rlhf.pdf
Published in Electronic Industry Press, 2023
With the widespread application of natural language processing and the rapid advancement of machine learning algorithms represented by deep learning, natural language processing algorithms and research tasks have been developing rapidly in recent years. Since 2003, the authors have taught natural language processing courses for undergraduates, master students, and doctoral students at the School of Computer Science and Technology, Fudan University. This book summarizes years of teaching and research, aiming to provide readers with a more systematic and comprehensive understanding of natural language processing.
Recommended citation: Qi Zhang, Tao Gui, Xuanjing Huang: Introduction to Natural Language Processing, Electronic Industry Press, 2023 https://intro-nlp.github.io/
Published in 电子工业出版社, 2023
随着自然语言处理的广泛应用以及以深度学习为代表的机器学习算法的快速进步,近年来自然语言处理算法和研究任务也在快速发展中。作者自2003年起,在复旦大学计算机科学技术学院针对本科生、硕士生和博士生先后分别开设了自然语言处理课程。本书对多年教学和研究进行总结梳理,希望使得读者对自然语言处理有更加系统性且全面的了解。
Recommended citation: 张奇、桂韬、黄萱菁:自然语言处理导论,电子工业出版社,2023 https://intro-nlp.github.io/
Published in CoRR abs/2401.06080, 2024
From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization.
Recommended citation: Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang: Secrets of RLHF in Large Language Models Part II: Reward Modeling. CoRR abs/2401.06080 (2024) http://xuanjing-huang.github.io/files/reward.pdf
Published in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times.
Recommended citation: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang. Searching for Best Practices in Retrieval-Augmented Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17716–17736. Association for Computational Linguistics, 2024. https://aclanthology.org/2024.emnlp-main.981/
Published in Science China Information Sciences, 2025
In this paper, we perform a comprehensive survey on LLM-based agents.
Recommended citation: Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, Tao Gui: The rise and potential of large language model based agents: a survey. Sci. China Inf. Sci. 68(2) (2025) https://link.springer.com/article/10.1007/s11432-024-4222-0
Published in Electronic Industry Press, 2025
This book introduces the fundamental theories of large language models, including language modeling, distributed model training, and reinforcement learning, with practical examples using the Deepspeed-Chat framework to implement large language models and ChatGPT-like systems.
Recommended citation: Qi Zhang, Tao Gui, Rui Zheng, Xuanjing Huang: Large Language Models: From Theory to Practice (2nd Edition), Electronic Industry Press, 2025 https://intro-llm.github.io/
Published in 电子工业出版社, 2025
本书将介绍大语言模型的基础理论包括语言模型、分布式模型训练以及强化学习,并以Deepspeed-Chat框架为例介绍实现大语言模型和类ChatGPT系统的实践。
Recommended citation: 张奇、桂韬、郑锐、黄萱菁:大规模语言模型:从理论与实践(第2版),电子工业出版社,2025 https://intro-llm.github.io/
Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Large language models (LLMs) have emerged as a promising foundation to build generally-capable agents (LLM-based agents) that can handle multi-turn decision-making tasks across various environments. However, the community lacks a unified interactive framework that covers diverse environments for comprehensive evaluation of agents, and enables exploration and learning for their self-improvement.
Recommended citation: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang. AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27914–27961, Vienna, Austria. Association for Computational Linguistics, 2025. https://aclanthology.org/2025.acl-long.1355/
Published in arXiv, 2025
We propose AgentGym-RL, a reinforcement learning framework for training large language model (LLM)-based agents to tackle long-horizon decision-making tasks through multi-turn interactions.
Recommended citation: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang. AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning. arXiv:2509.08755, 2025. https://arxiv.org/abs/2509.08755
Published in Proceedings of the National Academy of Sciences (PNAS), 2025
People acquire concepts through rich physical and social experiences and use them to understand and navigate the world. In contrast, large language models (LLMs), trained solely through next-token prediction on text, exhibit strikingly human-like behaviors. Are these models developing concepts akin to those in humans?
Recommended citation: Ningyu Xu, Qi Zhang, Chenyang Du, Qinan Luo, Xipeng Qiu, Xuanjing Huang, Menghan Zhang. Revealing emergent human-like conceptual representations from language prediction. Proceedings of the National Academy of Sciences (PNAS), 2025. https://doi.org/10.1073/pnas.2512514122
Published in arXiv preprint, 2026
We introduce OpenNovelty, an LLM-powered agentic system for verifiable scholarly novelty assessment that addresses the critical need for automated, reliable evaluation of research novelty.
Recommended citation: Yifan Liu, Yifan Wang, Zixuan Li, Zizheng Wang, Zihan Wang, Wenxuan Wang, Yifei Wang, Yifan Song, Yifan Liu, Xuanjing Huang, Zhilin Yang, Wei Chen, Tao Gui: OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment. arXiv preprint arXiv:2601.01576 (2026) https://arxiv.org/abs/2601.01576
Published in Science China Information Sciences, 2026
The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of existing methods.
Recommended citation: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Muling Wu, Yunbo Tao, Ming Zhang, Mingxu Chai, Jessica Fan, Zhiheng Xi, Rui Zheng, Yueming Wu, Ming Wen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang: What is wrong with your code generated by large language models? An extensive study. Sci. China Inf. Sci. 69, 112107 (2026) https://doi.org/10.1007/s11432-025-4632-8
Published in arXiv, 2026
Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-specific context and leverage new knowledge beyond what is learned during pre-training to reason and resolve tasks. We term this capability context learning, a crucial ability that humans naturally possess but has been largely overlooked.
Recommended citation: Shihan Dou, Ming Zhang, Zhangyue Yin, Chenhao Huang, Yujiong Shen, Junzhe Wang, Jiayi Chen, Yuchen Ni, Junjie Ye, Cheng Zhang, Huaibing Xie, Jianglu Hu, Shaolei Wang, Weichao Wang, Yanling Xiao, Yiting Liu, Zenan Xu, Zhen Guo, Pluto Zhou, Tao Gui, Zuxuan Wu, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Di Wang, Shunyu Yao. CL-bench: A Benchmark for Context Learning. arXiv:2602.03587, 2026. https://arxiv.org/abs/2602.03587
Published in ArXiv, 2026
We introduce SciAgentGym, a scalable interactive environment featuring 1,780 domain-specific tools across four natural science disciplines, supported by a robust execution infrastructure.
Recommended citation: Yujiong Shen, Yajie Yang, Zhiheng Xi, Binze Hu, Huayu Sha, Jiazheng Zhang, Qiyuan Peng, Junlin Shang, Jixuan Huang, Yutao Fan, Jingqi Tong, Shihan Dou, Ming Zhang, Lei Bai, Zhenfei Yin, Tao Gui, Xingjun Ma, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang: SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents. ArXiv 2602.12984 (2026) https://arxiv.org/pdf/2602.12984
Published in ArXiv, 2026
我们提出了SciAgentGym,一个可扩展的交互环境,包含跨4个自然学科的1780个领域特定工具,并配有强大的执行基础设施。
Recommended citation: Yujiong Shen, Yajie Yang, Zhiheng Xi, Binze Hu, Huayu Sha, Jiazheng Zhang, Qiyuan Peng, Junlin Shang, Jixuan Huang, Yutao Fan, Jingqi Tong, Shihan Dou, Ming Zhang, Lei Bai, Zhenfei Yin, Tao Gui, Xingjun Ma, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang: SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents. ArXiv 2602.12984 (2026) https://arxiv.org/pdf/2602.12984
Published in ArXiv, 2026
We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem.
Recommended citation: Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu: AI Can Learn Scientific Taste. ArXiv 2603.14473 (2026) https://arxiv.org/pdf/2603.14473
Published in ArXiv, 2026
我们提出了基于社区反馈的强化学习(RLCF),这是一种利用大规模社区信号作为监督的训练范式,并将科学品味学习形式化为偏好建模和对齐问题。
Recommended citation: Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu: AI Can Learn Scientific Taste. ArXiv 2603.14473 (2026) https://arxiv.org/pdf/2603.14473
Published in The ACM Web Conference 2026 (WWW 2026), 2026
We propose AgentPRM, a re-defined process reward model for LLM agent tasks that captures both the interdependence between sequential decisions and their contribution to the final goal, enabling better progress tracking and exploration-exploitation balance.
Recommended citation: Zhiheng Xi, Chenyang Liao, Guanyu Li, Yajie Yang, Wenxiang Chen, Zhihao Zhang, Bing Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang. AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress. In Proceedings of the ACM Web Conference 2026 (WWW 2026). https://arxiv.org/abs/2511.08325
Published:
From weak AI to Artificial General Intelligence (AGI), large language models (LLMs) are driving landmark breakthroughs in AI. However, LLMs still face notable limitations, such as the inability to predict future events, handle extremely long reasoning chains, or access information beyond their training data. This talk explores the evolutionary path from LLMs to world models. We first discuss how large models construct world knowledge through language, and present our group’s research on large language models (MOSS), speech models (SpeechGPT), and multimodal models (AnyGPT). We then introduce the concept of world models — AI systems capable of simulating and understanding environments to make decisions and predictions — and discuss the integration of embodied intelligence with world models, including cutting-edge directions such as Video-Language-Action models. The talk also covers multimodal alignment techniques (LLaVA, ChatBridge) and multimodal expansion (AnyGPT), envisioning the future roadmap from LLMs toward embodied intelligence and world models.
Published:
从弱人工智能到通用人工智能(AGI),大语言模型(LLM)正在推动 AI 的标志性突破。然而,LLM 仍存在诸多局限,如无法预测未来事件、处理超长推理和获取训练数据之外的信息等。本报告探讨从大语言模型到世界模型的发展路径。首先介绍大模型如何通过语言构建世界知识,以及课题组在大语言模型(MOSS)、语音大模型(SpeechGPT)和多模态大模型(AnyGPT)方面的研究成果。随后引入世界模型(World Model)的概念——能够模拟和理解环境并据此做出决策与预测的 AI 系统,并讨论具身智能与世界模型的结合,包括视觉-语言-动作模型(Video-Language-Action Model)等前沿方向。报告还介绍了多模态对齐(LLaVA、ChatBridge)和多模态扩展(AnyGPT)等技术,展望大语言模型向具身智能与世界模型演进的未来蓝图。
Published:
While large language models demonstrate remarkable capabilities, they also pose safety and ethical risks that necessitate alignment with human values. This talk provides a systematic overview of recent advances in LLM alignment techniques. We first motivate the need for alignment by examining safety and ethical challenges, along with the core alignment principles of helpfulness, honesty, and harmlessness. We then delve into human preference modeling, covering key issues such as reward model training, generalization, and online updating. Building on this foundation, we present RLHF-based alignment techniques including the PPO-MAX algorithm for stable training, Direct Preference Optimization (DPO), Linear Alignment for inference-time alignment, and multi-path feedback fusion methods. The talk also discusses post-alignment evaluation approaches for both safety/value alignment and capability alignment. Finally, we explore future directions including Self-Play multi-policy adversarial learning and reinforcement learning-centric reasoning models such as O1.
Published:
大语言模型在展现强大能力的同时,也存在安全与伦理风险,需要与人类价值观进行对齐。本报告系统介绍大语言模型对齐技术的最新进展。首先阐述对齐的必要性,分析大模型面临的安全伦理问题以及对齐的核心准则(有益性、诚实性、无害性)。随后深入介绍人类偏好建模方法,包括奖励模型的训练、泛化与在线更新等关键问题。在此基础上,重点讲解基于 RLHF 的对齐技术,涵盖 PPO-MAX 稳定训练算法、DPO 直接偏好优化、Linear Alignment 推理阶段对齐、以及多途径反馈融合等方法。报告还讨论了对齐后的模型评测方法,包括安全与价值观评测和能力对齐评测。最后展望对齐技术的未来方向,包括 Self-Play 多策略对抗学习以及以强化学习为核心的推理模型(如 O1)。
Published:
AI is revolutionizing scientific discovery, as highlighted by recent Nobel Prizes in Physics and Chemistry awarded to AI-related breakthroughs. This talk explores how large language models (LLMs) can turbocharge scientific research across the full discovery pipeline — from data analysis and hypothesis generation to experimental design. We begin with an overview of LLMs and their emergent abilities, then examine their growing capabilities in reading comprehension, coding, and complex reasoning. A key focus is on enhancing LLM reasoning for science, covering techniques such as Chain-of-Thought prompting, self-consistency, process supervision, and critique models. We further discuss agent-based modeling and simulation (ABMS) as a powerful paradigm for studying complex systems, and present recent advances in multimodal multi-agent systems for scientific tasks, including PhysicsMinions for physics olympiad problem solving and AtomAgents for AI-driven materials discovery.
Published:
人工智能正在深刻变革科学发现的方式,近年来诺贝尔物理学奖和化学奖均授予了与 AI 相关的突破性成果。本报告探讨大语言模型(LLMs)如何全方位赋能科学研究——从数据分析、假说生成到实验设计。报告首先介绍大语言模型及其涌现能力,展示其在阅读理解、编程和复杂推理等方面日益增强的能力。重点讨论面向科学的 LLM 推理增强技术,包括思维链(Chain-of-Thought)提示、自一致性(Self-Consistency)、过程监督(Process Supervision)和批判模型(Critique Models)等方法。此外,报告还介绍了基于智能体的建模与仿真(ABMS)作为研究复杂系统的强大范式,以及多模态多智能体系统在科学任务中的最新进展,包括用于物理竞赛解题的 PhysicsMinions 和面向材料发现的 AtomAgents。
Published:
Large language models are driving a paradigm revolution in artificial intelligence. From ChatGPT to Artificial General Intelligence (AGI), LLM-based agents have emerged as a highly promising technical direction. This talk first introduces the fundamental concepts of large models and agents, including the definition of AI agents, the core capabilities of LLM-based agents (perception and understanding, reasoning and planning, memory storage and retrieval, generation and action), as well as practical applications such as AI research assistants (WisPaper) and automated task execution. The talk then focuses on three major scaling challenges in building general long-horizon agents: Scaling Environments, Scaling Goals, and Scaling Interactions. Finally, we present our explorations and innovations on AgentGym, a cross-environment self-evolution framework, and AgentGym-RL, a reinforcement learning approach for long-horizon decision making.
Published:
大语言模型正在引领人工智能的范式革命,从 ChatGPT 到通用人工智能(AGI),基于大语言模型的智能体(LLM-based Agent)成为一条极具前景的技术路线。本报告首先介绍大模型与智能体的基本概念,包括 AI Agent 的定义、LLM-based Agent 的核心能力(感知与理解、推理与规划、记忆存储与检索、生成与行动),以及智能体在科研助手(WisPaper)、自动化任务等场景中的实际应用。随后,报告聚焦于构建通用长程智能体所面临的三大扩展挑战(Scaling Challenges):环境扩展(Scaling Environment)、目标扩展(Scaling Goals)和交互扩展(Scaling Interaction),并介绍我们在跨环境自我进化框架 AgentGym 以及长程决策强化学习方法 AgentGym-RL 上的探索与创新。
Undergraduate course, Fudan University, School of Computer Science, 2022