检索结果-内蒙古大学图书馆

Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited

International Journal of computer Vision 2022年第7期130卷 1873-1874页

作者： Yang, Xue Yan, Junchi Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai China

来源：评论

学校读者我要写书评

暂无评论

A BIRGAT MODEL FOR MULTI-INTENT SPOKEN LANGUAGE UNDERSTANDING WITH HIERARCHICAL SEMANTIC FRAMES

arXiv

引用

arXiv 2024年

作者： Xu, Hongshen Cao, Ruisheng Zhu, Su Jiang, Sheng Zhang, Hanchong Chen, Lu Yu, Kai MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China AISpeech Co. Ltd. Suzhou China

Previous work on spoken language understanding (SLU) mainly focuses on single-intent settings, where each input utterance merely contains one user intent. This configuration significantly limits the surface form of user utterances and the capacity of output semantics. In this work, we firstly propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS. The target semantic frame is organized in a 3-layer hierarchical structure to tackle the alignment and assignment problems in multi-intent cases. Accordingly, we devise a BiRGAT model to encode the hierarchy of ontology items, the backbone of which is a dual relational graph attention network. Coupled with the 3-way pointer-generator decoder, our method outperforms traditional sequence labeling and classification-based schemes by a large margin. Ablation study in transfer learning settings further uncovers the poor generalizability of current models in multi-intent cases. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Reducing Tool Hallucination via Reliability Alignment

arXiv

引用

arXiv 2024年

作者： Xu, Hongshen Zhu, Zichen Pan, Lei Wang, Zihan Zhu, Su Ma, Da Cao, Ruisheng Chen, Lu Yu, Kai X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai China AISpeech Co. Ltd. Suzhou China

Large Language Models (LLMs) have expanded their capabilities beyond language generation to interact with external tools, enabling automation and real-world applications. However, tool hallucinations—where models either select inappropriate tools or misuse them—pose significant challenges, leading to erroneous task execution, increased computational costs, and reduced system reliability. To systematically address this issue, we define and categorize tool hallucinations into two main types: tool selection hallucination and tool usage hallucination. To evaluate and mitigate these issues, we introduce RelyToolBench, which integrates specialized test cases and novel metrics to assess hallucination-aware task success and efficiency. Finally, we propose Relign, a reliability alignment framework that expands the tool-use action space to include indecisive actions, allowing LLMs to defer tool use, seek clarification, or adjust tool selection dynamically. Through extensive experiments, we demonstrate that Relign significantly reduces tool hallucinations, improves task reliability, and enhances the efficiency of LLM tool interactions. The code and data will be publicly available. Copyright © 2024, The Authors. All rights reserved.

关键词： Clarifiers

来源：评论

学校读者我要写书评

暂无评论

Making offline RL online: collaborative world models for offline visual reinforcement learning 24

Making offline RL online: collaborative world models for off...

引用

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Qi Wang Junming Yang Yunbo Wang Xin Jin Wenjun Zeng Xiaokang Yang MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China and Ningbo Institute of Digital Twin Eastern Institute of Technology China School of Computer Science and Engineering Southeast University China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China Ningbo Institute of Digital Twin Eastern Institute of Technology China

ISBN: (纸本)9798331314385

Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Attack Named Entity Recognition by Entity Boundary Interference

arXiv

引用

arXiv 2023年

作者： Yang, Yifei Wu, Hongqiu Zhao, Hai Department of Computer Science and Engineering Shanghai Jiao Tong University China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China

Named Entity Recognition (NER) is a cornerstone NLP task while its robustness has been given little attention. This paper rethinks the principles of NER attacks derived from sentence classification, as they can easily violate the label consistency between the original and adversarial NER examples. This is due to the fine-grained nature of NER, as even minor word changes in the sentence can result in the emergence or mutation of any entities, resulting in invalid adversarial examples. To this end, we propose a novel one-word modification NER attack based on a key insight, NER models are always vulnerable to the boundary position of an entity to make their decision. We thus strategically insert a new boundary into the sentence and trigger the Entity Boundary Interference that the victim model makes the wrong prediction either on this boundary word or on other words in the sentence. We call this attack Virtual Boundary Attack (ViBA), which is shown to be remarkably effective when attacking both English and Chinese models with a 70%-90% attack success rate on state-of-the-art language models (e.g. RoBERTa, DeBERTa) and also significantly faster than previous methods. Copyright © 2023, The Authors. All rights reserved.

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

arXiv

引用

arXiv 2023年

作者： Han, Bing Chen, Zhengyang Qian, Yanmin The X-Lance Lab Department of Computer Science and Engineering MoE Key Laboratory of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai200240 China

Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. Recently, self-supervised speaker verification has attracted a lot of interest by the reason of its no-dep.ndency on labeled data. In this article, we propose a novel and advanced self-supervised learning framework which can construct a very strong speaker verification system with high performance without using any labeled data. To avoid the impact of false negative pairs from the contrastive-learning based self-supervised learning, we adopt the self-distillation with no labels (DINO) framework as the initial model, which can be trained without exploiting negative pairs. Then, we further introduce a cluster-aware training strategy for DINO to improve the diversity of data. In the iteration learning stage, due to a mass of unreliable labels from unsupervised clustering, the quality of pseudo labels is important for the system performance. This motivates us to propose dynamic loss-gate and label correction (DLG-LC) methods to alleviate the performance degradation caused by unreliable labels. More specifically, we model the loss distribution with Gaussian Mixture Model (GMM) and obtain the loss-gate threshold dynamically to distinguish the reliable and unreliable labels. Besides, we adopt the model predictions to correct the unreliable label, for better utilizing the unreliable data rather than dropping them directly. Moreover, we extend the DLG-LC from single-modality to multi-modality on the audio-visual dataset to further improve the performance. The experiments are performed on the commonly used Voxceleb dataset. Compared to the best-known self-supervised speaker verification system, our proposed method obtain 22.17%, 27.94% and 25.56% relative EER improvement on Vox-O, Vox-E and Vox-H test sets, even with fewer it

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation

NIKI: Neural Inverse Kinematics with Invertible Neural Netwo...

引用

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Jiefeng Li Siyuan Bian Qi Liu Jiasheng Tang Fan Wang Cewu Lu Department of Computer Science and Engineering Shanghai Jiao Tong University Alibaba Group MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University

With the progress of 3D human pose and shape estimation, state-of-the-art methods can either be robust to occlusions or obtain pixel-aligned accuracy in non-occlusion cases. However, they cannot obtain robustness and mesh-image alignment at the same time. In this work, we present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bidirectional errors to improve the robustness to occlusions and obtain pixel-aligned accuracy. NIKI can learn from both the forward and inverse processes with invertible networks. In the inverse process, the model separates the error from the plausible 3D pose manifold for a robust 3D human pose estimation. In the forward process, we enforce the zero-error boundary conditions to improve the sensitivity to reliable joint positions for better mesh-image alignment. Furthermore, NIKI emulates the analytical inverse kinematics algorithms with the twist-and-swing decomposition for better interpretability. Experiments on standard and occlusion-specific benchmarks demonstrate the effectiveness of NIKI, where we exhibit robust and well-aligned results simultaneously. Code is available at https://***/Jeff-sjtu/NIKI.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SECURECUT: FEDERATED GRADIENT BOOSTING DECISION TREES WITH EFFICIENT MACHINE UNLEARNING

arXiv

引用

arXiv 2023年

作者： Zhang, Jian Li, Bowen Li, Jie Wu, Chentao Department of Computer Science and Engineering Shanghai Jiao Tong University China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China

In response to legislation mandating companies to honor the right to be forgotten by erasing user data, it has become imperative to enable data removal in Vertical Federated Learning (VFL) where multiple parties provide private features for model training. In VFL, data removal, i.e., machine unlearning, often requires removing specific features across all samples under privacy guarentee in federated learning. To address this challenge, we propose SecureCut, a novel Gradient Boosting Decision Tree (GBDT) framework that effectively enables both instance unlearning and feature unlearning without the need for retraining from scratch. Leveraging a robust GBDT structure, we enable effective data deletion while reducing degradation of model performance. Extensive experimental results on popular datasets demonstrate that our method achieves superior model utility and forgetfulness compared to state-of-the-art methods. To our best knowledge, this is the first work that investigates machine unlearning in VFL scenarios. © 2023, CC BY.

关键词： Decision trees

来源：评论

学校读者我要写书评

暂无评论

TOWARDS UNIVERSAL SPEECH DISCRETE TOKENS: A CASE STUDY FOR ASR AND TTS

arXiv

引用

arXiv 2023年

作者： Yang, Yifan Shen, Feiyu Du, Chenpeng Ma, Ziyang Yu, Kai Povey, Daniel Chen, Xie MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai Jiao Tong University China Xiaomi Corporation Beijing China

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges like overfitting and performance degradation in speech recognition tasks, often at the cost of sacrificing performance in multi-task scenarios. This study presents a comprehensive comparison and optimization of discrete tokens generated by various leading SSL models in speech recognition and synthesis tasks. We aim to explore the universality of speech discrete tokens across multiple speech tasks. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on FBank features in speech recognition tasks and outperform mel-spectrogram features in speech synthesis in subjective and objective metrics. These findings suggest that universal discrete tokens have enormous potential in various speech-related tasks. Our work is open-source and publicly available at https://***/k2-fsa/icefall. Index Terms- self-supervised learning, discrete tokens, speech recognition, text-to-speech. Copyright © 2023, The Authors. All rights reserved.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Towards Reliable and Empathetic dep.ession-Diagnosis-Oriented Chats

arXiv

引用

arXiv 2024年

作者： Lan, Kunyao Ming, Cong Yao, Binwei Chen, Lu Wu, Mengyue X-LANCE Lab. Dept. of Computer Science and Engineering China MoE Key Lab of Artificial Intelligence AI Institute China Shanghai Jiao Tong University China University of Wisconsin Madison United States

Chatbots can serve as a viable tool for preliminary dep.ession diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chitchat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose an innovative ontology definition and generation framework tailored explicitly for dep.ession diagnosis dialogues, combining the reliability of task-oriented conversations with the appeal of empathy-related chit-chat. We further apply the framework to D4, the only existing public dialogue dataset on dep.ession diagnosis-oriented chats. Exhaustive experimental results indicate significant improvements in task completion and emotional support generation in dep.ession diagnosis, fostering a more comprehensive approach to task-oriented chat dialogue system development and its applications in digital mental health. Copyright © 2024, The Authors. All rights reserved.

关键词： Diagnosis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：