中国人民大学健康大数据研究院

首页 > 学术交流

多模态理解与生成 | 2025 X智能大会 & 第18届中国R会议

发布时间：2025-09-16浏览量：

2025 X 智能大会暨第18届中国R会议将于2025年10月17日至19日在北京会议中心举行。本次会议聚焦大模型技术的前沿进展，及其在各领域的创新探索与实际应用。会议内容涵盖蒙特卡洛视角下的AI革命、扩散大语言模型、多模态理解与生成、Agent形态演变、Agent产业应用、大模型基础理论、AI4Science、具身智能、生物医药统计与大模型、AI赋能健康统计等重要方向。

欢迎访问以下链接或扫描二维码，报名本次会议！

链接：https://www.x-agi.cc/register.html
二维码：

下面为您奉上本次2025 X智能大会 & 第18届中国R会议【多模态理解与生成】的介绍。

会场主题

本会场由主席为【胡天阳】，聚焦于“多模态理解与生成”的最新研究进展。四位嘉宾将从以下多个角度展开分享：数据与建模方法的演化、扩散模型的收敛与加速、语言模型在微调与推理中的可解释性，以及基于昇腾平台的大模型算法探索。报告内容既涵盖前沿理论问题的探讨，也结合了方法优化与实际应用的深入实践，旨在为与会者提供一个多模态研究的全景视角，并推动跨学科领域的交流与思想碰撞。

会场内容

多模态生成与理解：数据与建模方法的演化

罗维俭

嘉宾简介：

罗维俭博士是小红书（RedNote/Xiaohongshu）公司人文智能实验室（hi-lab）的多模态大模型研究员。他在北京大学（PKU）数学科学学院获得统计学博士学位和应用统计学硕士学位，本科阶段则毕业于中国科学技术大学（USTC）数学系，获数学学士学位。

维俭目前在人文智能实验室进行大型生成式理解模型等研究，其团队专注于开发高效可扩展的生成式理解模型，这类模型能够进行推理、理解人类意图，并实时生成视觉 - 音频响应。维俭在人工智能学术会议与期刊如ICML，NeurIPS，ICLR，CVPR，TMLR等发表论文十余篇。他曾邀担任前沿人工智能期刊如Nature Communications（Nat. Com），Journal of Machine Learning Research（JMLR），IEEE Transactions on Image Processing（TIP），Pattern Recognition（PR）等杂志审稿人。同时，他也参与人工智能会议审稿，包括 NeurIPS、ICML、ICLR、CVPR、ICCV、AISTATS、UAI 等。

内容摘要：

过去十年，多模态生成经历了从 VAE/GAN ，自回归模型（AR Models），到扩散模型（Diffusion Models）的演化与放大（Scaling）；近两年多模态生成技术又与大语言模型深度耦合，形成多模态理解与多模态生成相结合的智能系统。在多模态智能迅速演进的进程中，高质量的数据和可扩展的建模方法成为了模型进步的两个重要方面。本报告以文-图多模态场景为例子，首先将从数据和建模方法的角度出发，系统梳理多模态生成模型的历史演进。我们将涵盖自回归模型，扩散模型和单步/少步生成模型等主流模型架构，并比较各个模型方案的利弊。同时我们将简要梳理介绍对生成模型友好的现有开源数据集；其次，我们将重点讨论一个研究热点问题，即多模态生成与多模态理解的关系，囊括了多模态生成理解一体化的一些近期工作以及报告者自身的一些思考，如生成理解之间的抑制与促进关系，生成理解模型的放大策略等。最后，我们将对多模态生成模型的未来演化方向做一个展望，并尝试归纳出一些重要的研究问题和应用场景。

Faster Convergence and Acceleration for Diffusion-Based Generative Models

Gen Li

嘉宾简介：

Gen Li is currently an assistant professor in the Department of Statistics and Data Science at the Chinese University of Hong Kong. His research interests include diffusion based generative model, and reinforcement learning.

内容摘要:

Diffusion models, which generate new data instances by learning to reverse a Markov diffusion process from noise, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain underdeveloped. Particularly, despite the recent surge of interest in accelerating sampling speed, convergence theory for these acceleration techniques remains limited. In this talk, I will first introduce an acceleration sampling scheme for stochastic samplers that provably improves the iteration complexity under minimal assumptions. The second part focuses on diffusion-based language models, whose ability to generate tokens in parallel significantly accelerates sampling relative to traditional autoregressive methods. Adopting an information-theoretic lens, we establish a sharp convergence theory for diffusion language models, thereby providing the first rigorous justification of both their efficiency and fundamental limits.

On the Mechanism Interpretability of LLM for Fine-tuning and Reasoning

Difan Zou

嘉宾简介：

Dr.Difan Zou is an assistant professor in computer science department and institute of data science at HKU. He has received his PhD degree in Department of Computer Science, University of California, Los Angeles (UCLA). His research interests are broadly in machine learning, deep learning theory, graph learning, mechanism interpretation, and interdisciplinary research between AI and other subjects. His research is published in top-tier machine learning conferences (ICML, NeurIPS, COLT, ICLR) and journal papers (IEEE Trans., JMLR, Nature Comm., PNAS, etc.). He serves as an area chair/senior PC member for NeurIPS, ICML and AAAI, and PC members for ICLR, COLT, etc.

内容摘要:

While Reinforcement Learning (RL) and Fine-Tuning demonstrably enhance Large Language Model (LLM) capabilities, particularly in reasoning and task adaptation, the underlying mechanisms remain poorly understood. This talk integrates insights from two complementary studies to advance mechanistic interpretability. First, we dissect Reinforcement Learning with Verifiable Rewards (RLVR), revealing its core benefit lies in optimizing the selection of existing high-success-rate reasoning patterns, with theoretical convergence analyses showing distinct dynamics for strong versus weak initial models (mitigated by prior supervised fine-tuning). Second, we employ circuit analysis to interpret fine-tuning mechanisms, uncovering that circuits undergo significant edge changes rather than merely adding components, contrasting prior findings. Leveraging this, we develop a circuit-aware LoRA method, improving performance over standard LoRA by 2.46%. Furthermore, we explore combining circuits for compositional tasks. Together, these studies provide novel theoretical and empirical insights: RL enhances reasoning primarily through pattern selection, while fine-tuning fundamentally rewires circuit connections. This deeper understanding informs the design of more effective and interpretable adaptation strategies for LLMs.

基于昇腾的多模态理解大模型算法探索

洪蓝青

嘉宾简介：

洪蓝青博士现任华为诺亚方舟实验室多模态大模型技术专家，博士毕业于新加坡国立大学。其研究方向聚焦于多模态大模型与生成式人工智能，主要探索现有大模型的优势与不足，挖掘能力边界，并提出高效的新一代模型与算法。她在人工智能领域的顶级国际会议上发表论文30余篇，Google Scholar引用次数超过3500次，曾担任NeurIPS、ICLR、CVPR等会议审稿人，并担任IJCAI 2025的领域主席（Area Chair）以及3DV 2025的产业主席（Industrial Chair）

内容摘要:

本报告将系统介绍在昇腾平台上开展的多模态理解大模型训练与算法探索。内容涵盖昇腾亲和的视觉编码器设计、离散化语音编码器设计，以及面向大规模训练的数据格式规范与高效数据处理流程。在模型训练方面，报告将重点探讨基于数千卡集群的高效多模态对齐方法，涉及模态对齐的范式选择与具体实现。通过这些系统性的研究与实践，我们总结了多模态大模型在昇腾平台上的关键Know-how，为实现高效、稳定、可扩展的多模态理解模型训练提供参考。

关于会议

本次大会由中国人民大学应用统计科学研究中心、中国人民大学统计学院、统计之都与中国商业统计学会人工智能分会主办，中国人民大学健康大数据研究院协办，并获得明汯投资、宽德投资和Will的赞助支持。我们诚挚邀请您来参会，共话智能技术发展前沿！

更多日程信息，详见会议通知：2025 X智能大会 & 第18届中国R会议通知

欢迎进入2025 X智能大会 & 第18届中国R会议官网，获取更多会议信息！

链接：https://www.x-agi.cc/index.html
二维码：

联系方式

公众号：统计之都
会议邮箱：xagi-2025@cosx.org

多模态理解与生成 | 2025 X智能大会 & 第18届中国R会议

联系方式

更多文章

友情链接

联系方式