检索结果-内蒙古大学图书馆

European Signal Processing Conference (EUSIPCO)

作者： Ryo Masumura Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi NTT Computer & Data Science Laboratories NTT Corporation

In this paper, we propose a novel training method for the transformer encoder-decoder based image captioning, which directly generates a captioning text from an input image. In general, many image- to- text paired datasets need to be prepared for robust image captioning, but such datasets cannot be collected in practical cases. Our key idea for mitigating the data preparation cost is to utilize text-to-text paraphrasing modeling, i.e., a task to convert an input text into different expressions without changing the meaning. In fact, paraphrasing deals with a similar transformation task to image captioning even though paraphrasing tasks have to handle texts instead of images. In our proposed method, an encoder-decoder network trained via the paraphrasing task is directly leveraged for image captioning. Thus, an encoder-decoder network pre-trained by a text-to-text transformation task is transferred into an image-to-text transformation task even though a different modal must be handled in the encoder network. Our experiments using the MS COCO caption datasets demonstrate the effectiveness of the proposed method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Covariance-Aware Feature Alignment with Pre-Computed Source Statistics for Test-Time Adaptation to Multiple Image Corruptions

Covariance-Aware Feature Alignment with Pre-Computed Source ...

引用

IEEE International Conference on Image Processing

作者： Kazuki Adachi Shin’Ya Yamaguchi Atsutoshi Kumagai NTT Computer and Data Science Laboratories Kyoto University

Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost. Test-time adaptation (TTA), which aims to adapt models without accessing the training dataset, is one of the settings that can address this problem. Existing TTA methods indeed work well on a single corruption. However, the adaptation ability is limited when multiple types of corruption occur, which is more realistic. We hypothesize this is because the distribution shift is more complicated, and the adaptation becomes more difficult in case of multiple corruptions. In fact, we experimentally found that a larger distribution gap remains after TTA. To address the distribution gap during testing, we propose a novel TTA method named Covariance-Aware Feature alignment (CAFe). We empirically show that CAFe outperforms prior TTA methods on image corruptions, including multiple types of corruptions.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation 12

Recurrent Neural Networks for Learning Long-term Temporal De...

引用

12th IEEE International Conference on Big Knowledge, ICBK 2021

作者： Ohno, Kentaro Kumagai, Atsutoshi NTT Computer Data Science Laboratories Japan

ISBN: (纸本)9781665438582

Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has recently been re-interpreted as a representative of the time scale of the state, i.e., a measure how long the RNN retains information on inputs. On the basis of this interpretation, several parameter initialization methods to exploit prior knowledge on temporal dependencies in data have been proposed to improve learn-ability. However, the interpretation relies on various unrealistic assumptions, such as that there are no inputs after a certain time point. In this work, we reconsider this interpretation of the forget gate in a more realistic setting. We first generalize the existing theory on gated RNNs so that we can consider the case where inputs are successively given. We then argue that the interpretation of a forget gate as a temporal representation is valid when the gradient of loss with respect to the state decreases exponentially as time goes back. We empirically demonstrate that existing RNNs satisfy this gradient condition at the initial training phase on several tasks, which is in good agreement with previous initialization methods. On the basis of this finding, we propose an approach to construct new RNNs that can represent a longer time scale than conventional models, which will improve the learnability for long-term sequential data. We verify the effectiveness of our method by experiments with real-world datasets. © 2021 IEEE.

关键词： Time series analysis

来源：评论

学校读者我要写书评

暂无评论

Receding-Horizon Trajectory Planning for Under-Actuated Autonomous Vehicles Based on Collaborative Neurodynamic Optimization

引用

IEEE/CAA Journal of Automatica Sinica 2022年第11期9卷 1909-1923页

作者： Jiasen Wang Jun Wang Qing-Long Han IEEE the Future Network Research Center Purple Mountain LaboratoriesNanjing 211111China the Department of Computer Science the School of Data ScienceCity University of Hong KongHong KongChina the School of Science Computing and Engineering TechnologiesSwinburne University of TechnologyMelbourne VIC 3122Australia

This paper addresses a major issue in planning the trajectories of under-actuated autonomous vehicles based on neurodynamic optimization.A receding-horizon vehicle trajectory planning task is formulated as a sequential global optimization problem with weighted quadratic navigation functions and obstacle avoidance constraints based on given vehicle goal *** feasibility of the formulated optimization problem is guaranteed under derived *** optimization problem is sequentially solved via collaborative neurodynamic optimization in a neurodynamics-driven trajectory planning method/*** results with under-actuated unmanned wheeled vehicles and autonomous surface vehicles are elaborated to substantiate the efficacy of the neurodynamics-driven trajectory planning method.

关键词： Collaborative neurodynamic optimization receding-horizon planning trajectory planning under-actuated vehicles

来源：评论

学校读者我要写书评

暂无评论

Complexity Reduction of Graph Signal Denoising Based on Fast Graph Fourier Transform

Complexity Reduction of Graph Signal Denoising Based on Fast...

引用

IEEE International Conference on Image Processing

作者： Takayuki Sasaki Yukihiro Bandoh Masaki Kitahara NTT Computer and Data Science Laboratories NTT Corporation

Denoising is one of the most fundamental and important problems in signal processing, and graph signal denoising methods have been actively studied. Several graph signal denoising methods based on mathematical programming require solving linear equations involving Laplacian matrix, which creates problem with computational accuracy and running time. This study proposes a fast and accurate solution of linear equations for denoising based on the fast graph Fourier transform method. Moreover, the proposed method can perform denoising not only on graphs for which the fast graph Fourier transform can be performed, but also on a wide class of graphs with more relaxed conditions, without loss of accuracy. Experiments demonstrate the efficiency of the proposed method and confirm that denoising can be performed up to 167.3 times faster without loss of accuracy.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

arXiv

引用

arXiv 2025年

作者： Chijiwa, Daiki Hasegawa, Taku Nishida, Kyosuke Saito, Kuniko Takeuchi, Susumu NTT Computer and Data Science Laboratories NTT Corporation Japan NTT Human Informatics Laboratories NTT Corporation Japan

While foundation models have been exploited for various expert tasks through fine-tuning, any foundation model will become outdated due to its old knowledge or limited capability. Thus the underlying foundation model should be eventually replaced by new ones, which leads to repeated cost of fine-tuning these new models. Existing work addresses this problem by inference-time tuning, i.e., modifying the output probabilities from the new foundation model with the outputs from the old foundation model and its fine-tuned model, which involves an additional overhead in inference by the latter two models. In this paper, we propose a new fine-tuning principle, Portable Reward Tuning (PRT), that reduces the inference overhead by its nature, based on the reformulation of fine-tuning as the reward maximization. Specifically, instead of fine-tuning parameters of the foundation models, PRT trains the reward model explicitly through the same loss function as in fine-tuning. During inference, the reward model can be used with any foundation model (with the same set of vocabularies or labels) through the formulation of reward maximization. Experimental results, covering both vision and language models, demonstrate that the PRT-trained model can achieve comparable accuracy to the existing work of inference-time tuning, with less inference cost. Copyright © 2025, The Authors. All rights reserved.

关键词： Reusability

来源：评论

学校读者我要写书评

暂无评论

End-to-End Joint Target and Non-Target Speakers ASR

arXiv

引用

arXiv 2023年

作者： Masumura, Ryo Makishima, Naoki Yamane, Taiga Yamazaki, Yoshihiko Mizuno, Saki Ihori, Mana Uchida, Mihiro Suzuki, Keita Sato, Hiroshi Tanaka, Tomohiro Takashima, Akihiko Suzuki, Satoshi Moriya, Takafumi Hojo, Nobukatsu Ando, Atsushi NTT Computer and Data Science Laboratories NTT Corporation Japan

This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker’s speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker’s speech by enrolling the target speaker’s information. However, in conversational ASR applications, transcribing both the target speaker’s speech and non-target speakers’ ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method. © 2023, CC BY.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Distilling Knowledge of Bidirectional Language Model for Scene Text Recognition

Distilling Knowledge of Bidirectional Language Model for Sce...

引用

IEEE International Conference on Image Processing

作者： Shota Orihashi Yoshihiro Yamazaki Mihiro Uchida Akihiko Takashima Ryo Masumura NTT Computer and Data Science Laboratories NTT Corporation Japan

This paper proposes a knowledge distillation method for an external bidirectional language model trained by masked language modeling to achieve high accuracy in scene text recognition. In Asian languages such as Japanese, it is necessary to perform text recognition in units of multiple words or sentences rather than individual words because words are not separated by spaces, and so high-level linguistic knowledge is needed to recognize text correctly. To enhance linguistic knowledge, several methods that use an external language model have been proposed, but these methods fail to consider future context well in performing text recognition because they revise the text candidates yielded by autoregressive text recognition models, which consider mainly past context. To overcome this deficiency, our key idea is to enhance a text recognition model by utilizing knowledge of an external bidirectional language model trained by masked language modeling, which reflects not only past but also future context. So as to actively consider future context in text recognition, our proposed method introduces a distillation loss term that makes the output probability of the text recognition model closer to that of the bidirectional language model. Experiments on Japanese scene text recognition demonstrate the effectiveness of the proposed method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

TRANSFERRING LEARNING TRAJECTORIES OF NEURAL NETWORKS

arXiv

引用

arXiv 2023年

作者： Chijiwa, Daiki NTT Computer and Data Science Laboratories NTT Corporation Japan

Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated or similar training runs in model ensemble or fine-tuning pre-trained models, for example. Once we have trained one DNN on some dataset, we have its learning trajectory (i.e., a sequence of intermediate parameters during training) which may potentially contain useful information for learning the dataset. However, there has been no attempt to utilize such information of a given learning trajectory for another training. In this paper, we formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one (named learning transfer problem) and derive the first algorithm to approximately solve it by matching gradients successively along the trajectory via permutation symmetry. We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training, and can be trained significantly faster than training from scratch. Copyright © 2023, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework

OnDA-DETR: Online Domain Adaptation for Detection Transforme...

引用

IEEE International Conference on Image Processing

作者： Satoshi Suzuki Taiga Yamane Naoki Makishima Keita Suzuki Atsushi Ando Ryo Masumura NTT Computer and Data Science Laboratories NTT Corporation Japan

This paper presents a novel method for online domain adaptation (OnDA) for DEtection TRansformer (DETR)-based object detection models called OnDA-DETR. OnDA is a domain adaptation paradigm that adapts a model trained on the source domain data to perform well on the target domain in an online manner during testing, using only the unlabeled test data from the target domain. Due to challenging and realistic problem settings, OnDA has garnered significant attention. However, OnDA methods for DETR-based models, which have demonstrated excellent performance in object detection research fields, had not been developed. OnDA-DETR is the first OnDA method specifically designed for DETR-based models. OnDA-DETR incorporates a self-training framework that generates pseudo-labels for the unlabeled target domain data. To effectively incorporate the self-training framework into DETR-based models, we leverage recall-aware pseudo-labeling and quality-aware training in OnDA-DETR. Experimental results indicate that OnDA-DETR improves the performance of the source-trained model by about 3.0 % points through OnDA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：