检索结果-内蒙古大学图书馆

National science Review 2024年第8期11卷 7-8页

作者： Zongben Xu Zhi-Hua Zhou Wenwu Zhu School of Mathematics and Statistics Xi'an Jiaotong University National Key Laboratory for Novel Software Technology Nanjing University Department of Computer Science and Technology Tsinghua University

With the exponential growth of big data and advancements in large-scale foundation model techniques, the field of machine learning has embarked on an unprecedented golden era. This period is characterized by significant innovations across various aspects of machine learning, including data exploitation, network architecture development, loss function settings and algorithmic innovation.

关键词： automation learning machine

来源：评论

学校读者我要写书评

暂无评论

A survey on multimodal large language models

引用

National science Review 2024年第12期11卷 277-296页

作者： Shukang Yin Chaoyou Fu Sirui Zhao Ke Li Xing Sun Tong Xu Enhong Chen School of Artificial Intelligence and Data Science University of Science and Technology of China State Key Laboratory for Novel Software Technology Nanjing University School of Intelligence Science and Technology Nanjing University Tencent YouTu Lab

Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images and optical character recognition–free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even outperform GPT-4V, pushing the limit of research at a surprising speed. In this paper, we aim to trace and summarize the recent progress of MLLMs. First, we present the basic formulation of the MLLM and delineate its related concepts, including architecture,training strategy and data, as well as evaluation. Then, we introduce research topics about how MLLMs can be extended to support more granularity, modalities, languages and scenarios. We continue with multimodal hallucination and extended techniques, including multimodal in-context learning, multimodal chain of thought and LLM-aided visual reasoning. To conclude the paper, we discuss existing challenges and point out promising research directions.

关键词： multimodal large language model vision language model large language model

来源：评论

学校读者我要写书评

暂无评论

On the exact quantum query complexity of MOD and EXACT functions

引用

Frontiers of computer science 2025年第4期19卷 91-98页

作者： Penghui YAO Zekun YE State Key Laboratory for Novel Software Technology Nanjing UniversityNanjing 210023China Hefei National Laboratory Hefei 230088China

In this paper,we consider the exact quantum query complexity of two fundamental symmetric functions.1)MOD_(m)^(n),which calculates the Hamming weight of an-bit string modulo;2)EXACT_(k,l)^(n),which determines if the Hamming weight of an-bit string is exactly k or *** these two symmetric functions have received considerable attention,their exact quantum query complexities have not been fully ***,our results are as follows:1)We design an optimal quantum query algorithm to compute MOD_(m)^(n)exactly and thus provide a tight characterization of its exact quantum query complexity,which settles a previous *** on this algorithm,we demonstrate that a broad class of symmetric functions is not evasive in the quantum model,i.e.,there exist quantum algorithms to compute these functions exactly when the number of queries is less than their input size.2)By proposing a quantum algorithm that utilizes the minimum number of queries to compute EXACT_(k,l)^(n)exactly for some specific values of k and l,we give a tight characterization of its exact quantum query complexity in these scenarios.

关键词： quantum computing query complexity symmetric functions exact algorithms evasiveness

来源：评论

学校读者我要写书评

暂无评论

Clustered Reinforcement Learning

引用

Frontiers of computer science 2025年第4期19卷 43-57页

作者： Xiao MA Shen-Yi ZHAO Zhao-Heng YIN Wu-Jun LI National Key Laboratory for Novel Software Technology Department of Computer Science and TechnologyNanjing UniversityNanjing 210023China Department of Electrical Engineering and Computer Sciences University of CaliforniaBerkeleyCA 94720-1770USA

Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse *** exploration,the agent tries to discover unexplored(novel)areas or high reward(quality)*** existing methods perform exploration by only utilizing the novelty of *** novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s *** address this problem,this paper proposes a novel RL framework,called clustered reinforcement learning(CRL),for efficient exploration in *** adopts clustering to divide the collected states into several clusters,based on which a bonus reward reflecting both novelty and quality in the neighboring area(cluster)of the current state is given to the *** leverages these bonus rewards to guide the agent to perform efficient ***,CRL can be combined with existing exploration strategies to improve their performance,as the bonus rewards employed by these existing exploration strategies solely capture the novelty of *** on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance.

关键词： deep reinforcement learning exploration count-based method clustering K-means

来源：评论

学校读者我要写书评

暂无评论

A unified pruning framework for vision transformers

引用

science China(Information sciences) 2023年第7期66卷 303-304页

作者： Hao YU Jianxin WU State Key Laboratory for Novel Software Technology Nanjing University

The transformer architecture [1] has been widely used for natural language processing(NLP) tasks. Under the inspiration of its excellent performance in NLP, transformer-based models [2, 3] have established many new records in various computer vision tasks. However, most vision transformers(Vi Ts) suffer from large model sizes, large run-time memory consumption, and high computational costs. Therefore, impending needs exist to develop and deploy lightweight and efficient vision transformers.

关键词：

来源：评论

学校读者我要写书评

暂无评论

PyCIL: a Python toolbox for class-incremental learning

引用

science China(Information sciences) 2023年第9期66卷 291-292页

作者： Da-Wei ZHOU Fu-Yun WANG Han-Jia YE De-Chuan ZHAN State Key Laboratory for Novel Software Technology Nanjing University

With the rapid development of deep learning, current deep models can learn a fixed number of classes with high performance. However, in our ever-changing world, data often come from the open environment, which is with stream format or available temporarily due to privacy issues. As a result, the classification model should learn new classes incrementally instead of restarting the training process.

关键词：

来源：评论

学校读者我要写书评

暂无评论

E±cient Construction of Practical Python Call Graphs with Entity Knowledge Base

引用

International Journal of software Engineering and Knowledge Engineering 2024年第7期34卷 999-1024页

作者： Cao, Yulu Chen, Lin Chen, Zhifei Zhong, Jiacheng Zhang, Xiaowei Wang, Linzhang State Key Laboratory for Novel Software Technology Nanjing University Jiangsu Nanjing China School of Computer Science and Engineering Nanjing University of Science and Technology Jiangsu Nanjing China

Call graphs facilitate various tasks in software engineering. However, for the dynamic language Python, the complex language features and external library dependencies pose enormous challenges for building the call graphs of real projects. Some program analysis techniques used for call graph construction in other languages are impractical for Python. In this paper, we present STAR, a practical technique for the construction of Python static call graphs. We reformulate call graph construction as an entity identification task. STAR leverages inter-module summary and cross-project dependencies to construct a fine-grained entity knowledge base to identify the possible nodes and edges of the call graph in the code, and then construct the call graph. Our evaluation of three benchmarks shows that (1) STAR improves recall in three benchmarks compared to three baseline tools. Especially, STAR improves the recall of reachable nodes and reachable edges compared with the state-of-the-art tool by 11.3% and 9.8%, respectively;(2) STAR achieves comparable performance as three baseline tools in execution time and memory usage and is more efficient in large projects;(3) STAR can be effectively used for the task of detecting vulnerability propagation with real-world cases. We expect our results will attract more exploration of practical methods and improve the application of Python call graphs. © World Scientific Publishing Company.

关键词： Python

来源：评论

学校读者我要写书评

暂无评论

Optimizing Monitoring Utility of Unmanned Aerial Vehicles Considering Adverse Effects

引用

IEEE Transactions on Mobile Computing 2025年第7期24卷 5996-6013页

作者： Zhang, Haihan Dai, Haipeng Qiu, Yu Yu, Enze Zhou, Ruiben Wang, Weijun Wang, Jingwu Chen, Guihai Nanjing University State Key Laboratory for Novel Software Technology Nanjing210023 China South China University of Technology School of Computer Science and Engineering Guangzhou China Beijing China

For Unmanned Aerial Vehicles (UAVs) monitoring tasks, capturing high quality images of target objects is important for subsequent recognition. Concerning the problem, many prior works study placement/trajectory planning for UAVs to maximize the quality of captured images. However, all of them overlook a fact that UAV monitoring may cause a huge risk/annoyance on living objects. In this paper, we investigate the novel problem of oPtimizing unmanned aErial vehicles plAcement by Considering both monitoring utility and adverse Effects (PEACE). We propose an approach to solve PEACE, which is proved to be NP-hard. Overall, our approach achieves a 1-1/e- approximation ratio. First, we approximate the original problem of PEACE as a classical problem of Monotone Submodular function Maximization under a Uniform Matroid constraint (MSMUM) with a controlled gap. Then, for MSMUM, we propose a combination of algorithms achieving a 1-1/e approximation and O(n\log n) time complexity considering the correlation among the UAV monitoring strategies. The proposed algorithms outperform existing algorithms for MSMUM through theoretical analysis and experimental results. Extensive simulations and field experiments demonstrate the effectiveness of our approach, achieving performance gains of 9.0% to 1434.5% compared to existing methods. © 2002-2012 IEEE.

关键词： Unmanned aerial vehicles (UAV)

来源：评论

学校读者我要写书评

暂无评论

Woodpecker: hallucination correction for multimodal large language models

引用

science China(Information sciences) 2024年第12期67卷 52-64页

作者： Shukang YIN Chaoyou FU Sirui ZHAO Tong XU Hao WANG Dianbo SUI Yunhang SHEN Ke LI Xing SUN Enhong CHEN School of Artificial Intelligence and Data Science University of Science and Technology of China State Key Laboratory for Novel Software Technology Nanjing University School of Intelligence Science and Technology Nanjing University Institute of Automation Chinese Academy of Sciences YouTu

Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like woodpeckers heal trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://***/BradyFU/Woodpecker.

关键词： multimodal learning multimodal large language models hallucination correction large language models vision and language

来源：评论

学校读者我要写书评

暂无评论

Pairwise tagging framework for end-to-end emotion-cause pair extraction

引用

Frontiers of computer science 2023年第2期17卷 111-120页

作者： Zhen WU Xinyu DAI Rui XIA National Key Laboratory for Novel Software Technology Nanjing UniversityNanjing 210023China Collaborative Innovation Center of Novel Software Technology and Industrialization Nanjing 210023China School of Computer Science and Engineering Nanjing University of Science and TechnologyNanjing 210023China

Emotion-cause pair extraction(ECPE)aims to extract all the pairs of emotions and corresponding causes in a *** generally contains three subtasks,emotions extraction,causes extraction,and causal relations detection between emotions and *** works adopt pipelined approaches or multi-task learning to address the ECPE ***,the pipelined approaches easily suffer from error propagation in real-world *** multi-task learning cannot optimize all tasks globally and may lead to suboptimal extraction *** address these issues,we propose a novel framework,Pairwise Tagging Framework(PTF),tackling the complete emotion-cause pair extraction in one unified tagging *** prior works,PTF innovatively transforms all subtasks of ECPE,i.e.,emotions extraction,causes extraction,and causal relations detection between emotions and causes,into one unified clause-pair tagging *** this unified tagging task,we can optimize the ECPE task globally and extract more accurate emotion-cause *** validate the feasibility and effectiveness of PTF,we design an end-to-end PTF-based neural network and conduct experiments on the ECPE benchmark *** experimental results show that our method outperforms pipelined approaches significantly and typical multi-task learning approaches.

关键词： emotion-cause pair extraction pairwise tagging framework end-to-end neural network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：