学术论文

本页面展示了我的一部分学术论文。如需查看更完整的论文列表，请访问我的Google Scholar 个人主页、Semantic Scholar 个人主页、计算机科学文献数据库或ACL 论文集。

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Published in The ACM Web Conference 2026 (WWW 2026), 2026

We propose AgentPRM, a re-defined process reward model for LLM agent tasks that captures both the interdependence between sequential decisions and their contribution to the final goal, enabling better progress tracking and exploration-exploitation balance.

Recommended citation: Zhiheng Xi, Chenyang Liao, Guanyu Li, Yajie Yang, Wenxiang Chen, Zhihao Zhang, Bing Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang. AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress. In Proceedings of the ACM Web Conference 2026 (WWW 2026). https://arxiv.org/abs/2511.08325

AI Can Learn Scientific Taste

Published in ArXiv, 2026

我们提出了基于社区反馈的强化学习（RLCF），这是一种利用大规模社区信号作为监督的训练范式，并将科学品味学习形式化为偏好建模和对齐问题。

Recommended citation: Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu: AI Can Learn Scientific Taste. ArXiv 2603.14473 (2026) https://arxiv.org/pdf/2603.14473

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

Published in ArXiv, 2026

我们提出了SciAgentGym，一个可扩展的交互环境，包含跨4个自然学科的1780个领域特定工具，并配有强大的执行基础设施。

Recommended citation: Yujiong Shen, Yajie Yang, Zhiheng Xi, Binze Hu, Huayu Sha, Jiazheng Zhang, Qiyuan Peng, Junlin Shang, Jixuan Huang, Yutao Fan, Jingqi Tong, Shihan Dou, Ming Zhang, Lei Bai, Zhenfei Yin, Tao Gui, Xingjun Ma, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang: SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents. ArXiv 2602.12984 (2026) https://arxiv.org/pdf/2602.12984

CL-bench: A Benchmark for Context Learning

Published in arXiv, 2026

Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-specific context and leverage new knowledge beyond what is learned during pre-training to reason and resolve tasks. We term this capability context learning, a crucial ability that humans naturally possess but has been largely overlooked.

Recommended citation: Shihan Dou, Ming Zhang, Zhangyue Yin, Chenhao Huang, Yujiong Shen, Junzhe Wang, Jiayi Chen, Yuchen Ni, Junjie Ye, Cheng Zhang, Huaibing Xie, Jianglu Hu, Shaolei Wang, Weichao Wang, Yanling Xiao, Yiting Liu, Zenan Xu, Zhen Guo, Pluto Zhou, Tao Gui, Zuxuan Wu, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Di Wang, Shunyu Yao. CL-bench: A Benchmark for Context Learning. arXiv:2602.03587, 2026. https://arxiv.org/abs/2602.03587

What is wrong with your code generated by large language models? An extensive study

Published in Science China Information Sciences, 2026

The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of existing methods.

Recommended citation: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Muling Wu, Yunbo Tao, Ming Zhang, Mingxu Chai, Jessica Fan, Zhiheng Xi, Rui Zheng, Yueming Wu, Ming Wen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang: What is wrong with your code generated by large language models? An extensive study. Sci. China Inf. Sci. 69, 112107 (2026) https://doi.org/10.1007/s11432-025-4632-8

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Published in arXiv preprint, 2026

We introduce OpenNovelty, an LLM-powered agentic system for verifiable scholarly novelty assessment that addresses the critical need for automated, reliable evaluation of research novelty.

Recommended citation: Yifan Liu, Yifan Wang, Zixuan Li, Zizheng Wang, Zihan Wang, Wenxuan Wang, Yifei Wang, Yifan Song, Yifan Liu, Xuanjing Huang, Zhilin Yang, Wei Chen, Tao Gui: OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment. arXiv preprint arXiv:2601.01576 (2026) https://arxiv.org/abs/2601.01576

Revealing emergent human-like conceptual representations from language prediction

Published in Proceedings of the National Academy of Sciences (PNAS), 2025

People acquire concepts through rich physical and social experiences and use them to understand and navigate the world. In contrast, large language models (LLMs), trained solely through next-token prediction on text, exhibit strikingly human-like behaviors. Are these models developing concepts akin to those in humans?

Recommended citation: Ningyu Xu, Qi Zhang, Chenyang Du, Qinan Luo, Xipeng Qiu, Xuanjing Huang, Menghan Zhang. Revealing emergent human-like conceptual representations from language prediction. Proceedings of the National Academy of Sciences (PNAS), 2025. https://doi.org/10.1073/pnas.2512514122

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Published in arXiv, 2025

We propose AgentGym-RL, a reinforcement learning framework for training large language model (LLM)-based agents to tackle long-horizon decision-making tasks through multi-turn interactions.

Recommended citation: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang. AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning. arXiv:2509.08755, 2025. https://arxiv.org/abs/2509.08755

AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments

Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Large language models (LLMs) have emerged as a promising foundation to build generally-capable agents (LLM-based agents) that can handle multi-turn decision-making tasks across various environments. However, the community lacks a unified interactive framework that covers diverse environments for comprehensive evaluation of agents, and enables exploration and learning for their self-improvement.

Recommended citation: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang. AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27914–27961, Vienna, Austria. Association for Computational Linguistics, 2025. https://aclanthology.org/2025.acl-long.1355/

大规模语言模型：从理论与实践（第2版）

Published in 电子工业出版社, 2025

本书将介绍大语言模型的基础理论包括语言模型、分布式模型训练以及强化学习，并以Deepspeed-Chat框架为例介绍实现大语言模型和类ChatGPT系统的实践。

Recommended citation: 张奇、桂韬、郑锐、黄萱菁：大规模语言模型：从理论与实践（第2版），电子工业出版社，2025 https://intro-llm.github.io/

The Rise and Potential of Large Language Model Based Agents: A Survey

Published in Science China Information Sciences, 2025

In this paper, we perform a comprehensive survey on LLM-based agents.

Recommended citation: Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, Tao Gui: The rise and potential of large language model based agents: a survey. Sci. China Inf. Sci. 68(2) (2025) https://link.springer.com/article/10.1007/s11432-024-4222-0

Searching for Best Practices in Retrieval-Augmented Generation

Published in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times.

Recommended citation: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang. Searching for Best Practices in Retrieval-Augmented Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17716–17736. Association for Computational Linguistics, 2024. https://aclanthology.org/2024.emnlp-main.981/

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Published in CoRR abs/2401.06080, 2024

From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization.

Recommended citation: Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang: Secrets of RLHF in Large Language Models Part II: Reward Modeling. CoRR abs/2401.06080 (2024) http://xuanjing-huang.github.io/files/reward.pdf

自然语言处理导论

Published in 电子工业出版社, 2023

随着自然语言处理的广泛应用以及以深度学习为代表的机器学习算法的快速进步，近年来自然语言处理算法和研究任务也在快速发展中。作者自2003年起，在复旦大学计算机科学技术学院针对本科生、硕士生和博士生先后分别开设了自然语言处理课程。本书对多年教学和研究进行总结梳理，希望使得读者对自然语言处理有更加系统性且全面的了解。

Recommended citation: 张奇、桂韬、黄萱菁：自然语言处理导论，电子工业出版社，2023 https://intro-nlp.github.io/

Secrets of RLHF in Large Language Models Part I: PPO

Published in CoRR abs/2307.04964, 2023

We dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.

Recommended citation: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang: Secrets of RLHF in Large Language Models Part I: PPO. CoRR abs/2307.04964 (2023) http://xuanjing-huang.github.io/files/rlhf.pdf

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Published in Findings of the Association for Computational Linguistics: ACL-IJCNLP, 2021

The paper proposes a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.

Recommended citation: Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, Ming Zhou: K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. ACL/IJCNLP (Findings) 2021: 1405-1418 http://xuanjing-huang.github.io/files/K-Adapter.pdf

Pre-trained Models for Natural Language Processing: A Survey

Published in SCIENCE CHINA Technological Sciences (SCTS), 2020

In this survey, we provide a comprehensive review of PTMs for NLP.

Recommended citation: Xipeng Qiu, TianXiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang, Pre-trained Models for Natural Language Processing: A Survey, SCIENCE CHINA Technological Sciences (SCTS) , 2020, Vol. 63(10), pp. 1872–1897 http://xuanjing-huang.github.io/files/PTM.pdf

FLAT: Chinese NER Using Flat-Lattice Transformer

Published in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans.

Recommended citation: Xiaonan Li, Hang Yan, Xipeng Qiu, Xuanjing Huang: FLAT: Chinese NER Using Flat-Lattice Transformer. ACL 2020: 6836-6842 http://xuanjing-huang.github.io/files/FLAT.pdf

How to Fine-Tune BERT for Text Classification?

Published in The Eighteenth China National Conference on Computational Linguistics, 2019

In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task.

Recommended citation: Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang: How to Fine-Tune BERT for Text Classification? CCL 2019: 194-206 http://xuanjing-huang.github.io/files/bert-ft.pdf

Adversarial Multi-Criteria Learning for Chinese Word Segmentation

Published in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria.

Recommended citation: Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang: Adversarial Multi-Criteria Learning for Chinese Word Segmentation. ACL (1) 2017: 1193-1203 http://xuanjing-huang.github.io/files/cws.pdf

Adversarial Multi-task Learning for Text Classification

Published in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

The paper proposed an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other.

Recommended citation: Pengfei Liu, Xipeng Qiu, Xuanjing Huang: Adversarial Multi-task Learning for Text Classification. ACL (1) 2017: 1-10 http://xuanjing-huang.github.io/files/AMT.pdf

Recurrent Neural Network for Text Classification with Multi-Task Learning

Published in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

We propose three different mechanisms of sharing information to model text with task-specific and shared layers.

Recommended citation: Pengfei Liu, Xipeng Qiu, Xuanjing Huang: Recurrent Neural Network for Text Classification with Multi-Task Learning. IJCAI 2016: 2873-2879 http://xuanjing-huang.github.io/files/RNN.pdf

Xuanjing Huang (黄萱菁)

学术论文