检索结果-内蒙古大学图书馆

Modeling Lead-Lag Structure in Facial Expression Synchrony for Social-Psychological Outcome Prediction from Negotiation Interaction

学校读者我要写书评

暂无评论

Modeling Lead-Lag Structure in Facial Expression Synchrony f...

Acoustics, Speech, and Signal Processing Workshops (ICASSPW), IEEE International Conference on

作者： Nobukatsu Hojo Saki Mizuno Satoshi Kobashikawa Ryo Masumura NTT Computer & Data Science Laboratories

This study proposes introducing facial-expression synchrony features to machine learning to estimate a customer’s psychological information from online business negotiation dialogue data. It is important for synchrony features to model the information on who led the synchrony and who followed it, the lead-lag structure, because the psychology of the leader and follower can differ. However, conventional synchrony models cannot incorporate such lead-lag structure information because they are based on the assumption that synchrony involves the co-occurrence of features in the same frame. To solve this problem, we propose using synchrony features extracted on the basis of windowed time-lagged cross-correlation, which cuts out a short segment from each of the input sequences and computes the cross-correlation between the segments. Since this method measures the similarity of signals across different frames, it is suitable for modeling the lead-lag structure. We conducted experiments based on an audio visual corpus of business negotiation dialogue assessed with various psychological measurements. The results indicate that considering lead-lag information can improve the accuracy in estimating psychological information.

关键词：

Next-Speaker Prediction Based on Non-Verbal Information in Multi-Party Video Conversation

学校读者我要写书评

暂无评论

Next-Speaker Prediction Based on Non-Verbal Information in M...

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Saki Mizuno Nobukatsu Hojo Satoshi Kobashikawa Ryo Masumura NTT Computer & Data Science Laboratories

We propose a method for next-speaker prediction, a task to predict who speaks in the next turn among multiple current listeners, in multi-party video conversation. Previous studies used non-verbal features, such as head movements and gaze behavior, for next-speaker prediction in face-to-face conversation. However, in video conversation, these non-verbal features are vague and ineffective because they look at the screen displaying other participants. Since non-verbal features include participant characteristics, it is necessary to use training data with rich combinations of participants to robustly predict the next speaker. Previous studies used training data with a limited number of combinations of participants because the data consist only of recorded data. Therefore, the proposed method uses 1) novel non-verbal features for next-speaker prediction in video conversation, specifically facial expressions, hand movements and speech segments, and 2) data augmentation of participant combinations in the training data. We conducted experiments to evaluate the proposed method, and the results using video-conversation data indicate its effectiveness.

关键词： Training data Oral communication Signal processing Acoustics Behavioral sciences Task analysis Speech processing

Text-to-Text Pre-Training with Paraphrasing for Improving Transformer-Based Image Captioning

学校读者我要写书评

暂无评论

Text-to-Text Pre-Training with Paraphrasing for Improving Tr...

European Signal Processing Conference (EUSIPCO)

作者： Ryo Masumura Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi NTT Computer & Data Science Laboratories NTT Corporation

In this paper, we propose a novel training method for the transformer encoder-decoder based image captioning, which directly generates a captioning text from an input image. In general, many image- to- text paired datasets need to be prepared for robust image captioning, but such datasets cannot be collected in practical cases. Our key idea for mitigating the data preparation cost is to utilize text-to-text paraphrasing modeling, i.e., a task to convert an input text into different expressions without changing the meaning. In fact, paraphrasing deals with a similar transformation task to image captioning even though paraphrasing tasks have to handle texts instead of images. In our proposed method, an encoder-decoder network trained via the paraphrasing task is directly leveraged for image captioning. Thus, an encoder-decoder network pre-trained by a text-to-text transformation task is transferred into an image-to-text transformation task even though a different modal must be handled in the encoder network. Our experiments using the MS COCO caption datasets demonstrate the effectiveness of the proposed method.

关键词：

Complexity Reduction of Graph Signal Denoising Based on Fast Graph Fourier Transform

学校读者我要写书评

暂无评论

Complexity Reduction of Graph Signal Denoising Based on Fast...

IEEE International Conference on Image Processing

作者： Takayuki Sasaki Yukihiro Bandoh Masaki Kitahara NTT Computer and Data Science Laboratories NTT Corporation

Denoising is one of the most fundamental and important problems in signal processing, and graph signal denoising methods have been actively studied. Several graph signal denoising methods based on mathematical programming require solving linear equations involving Laplacian matrix, which creates problem with computational accuracy and running time. This study proposes a fast and accurate solution of linear equations for denoising based on the fast graph Fourier transform method. Moreover, the proposed method can perform denoising not only on graphs for which the fast graph Fourier transform can be performed, but also on a wide class of graphs with more relaxed conditions, without loss of accuracy. Experiments demonstrate the efficiency of the proposed method and confirm that denoising can be performed up to 167.3 times faster without loss of accuracy.

关键词：

Toward data Efficient Model Merging between Different datasets without Performance Degradation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Yamda, Masanori Yamashita, Tomoya Yamaguchi, Shin'ya Chijiwa, Daiki NTT Social Informatics Laboratories NTT Computer and Data Science Laboratories

Model merging is attracting attention as a novel method for creating a new model by combining the weights of different trained models. While previous studies reported that model merging works well for models trained on a single dataset with different random seeds, model merging between different datasets remains unsolved. In this paper, we attempt to reveal the difficulty in merging such models trained on different datasets and alleviate it. Our empirical analyses show that, in contrast to the single-dataset scenarios, dataset information needs to be accessed to achieve high accuracy when merging models trained on different datasets. However, the requirement to use full datasets not only incurs significant computational costs but also becomes a major limitation when integrating models developed and shared by others. To address this, we demonstrate that dataset reduction techniques, such as coreset selection and dataset condensation, effectively reduce the data requirement for model merging. In our experiments with SPLIT-CIFAR10 model merging, the accuracy is significantly improved by 31% when using the full dataset and 24% when using the sampled subset compared with not using the dataset. Copyright © 2023, The Authors. All rights reserved.

关键词： data assimilation

META-LEARNING TO CALIBRATE GAUSSIAN PROCESSES WITH DEEP KERNELS FOR REGRESSION UNCERTAINTY ESTIMATION A PREPRINT

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Iwata, Tomoharu Kumagai, Atsutoshi NTT Communication Science Laboratories NTT Corporation NTT Computer and Data Science Laboratories NTT Corporation

Although Gaussian processes (GPs) with deep kernels have been succesfully used for meta-learning in regression tasks, its uncertainty estimation performance can be poor. We propose a meta-learning method for calibrating deep kernel GPs for improving regression uncertainty estimation performance with a limited number of training data. The proposed method meta-learns how to calibrate uncertainty using data from various tasks by minimizing the test expected calibration error, and uses the knowledge for unseen tasks. We design our model such that the adaptation and calibration for each task can be performed without iterative procedures, which enables effective meta-learning. In particular, a taskspecific uncalibrated output distribution is modeled by a GP with a task-shared encoder network, and it is transformed to a calibrated one using a cumulative density function of a task-specific Gaussian mixture model (GMM). By integrating the GP and GMM into our neural network-based model, we can meta-learn model parameters in an end-to-end fashion. Our experiments demonstrate that the proposed method improves uncertainty estimation performance while keeping high regression performance compared with the existing methods using real-world datasets in few-shot settings. Copyright © 2023, The Authors. All rights reserved.

关键词： Calibration

Visually Equivalent Light Field 3-D for Portable Displays

学校读者我要写书评

暂无评论

Visually Equivalent Light Field 3-D for Portable Displays

作者： Date, Munekazu Shimizu, Shinya Yamamoto, Susumu Nippon Telegraph and Telephone Corporation Ntt Computer and Data Science Laboratories Yokosuka239-0847 Japan Ntt Computer and Data Science Laboratories Yokosuka239-0847 Japan Nippon Telegraph and Telephone Corporation Ntt Human Informatics Laboratories Tokyo108-0023 Japan

Highly realistic 3-D displays that can reproduce object images to look like physical objects are utilized for natural and correct remote operation in industrial scenes. Therefore, we developed a visually equivalent light field 3-D (VELF3D) display that can produce highly realistic, accurate images with a high resolution and a smooth, accurate motion parallax. However, the observation distance is slightly long, and users cannot reach the displayed images. Therefore, we aim to develop a tablet-computer-type VELF3D display that enables users to touch the displayed objects. The display viewpoint density has been increased to achieve a shorter observation distance, while maintaining the display depth range. Because higher resolutions are required for a close observation distance and increased display viewpoints, we aimed to improve the effective resolution using almost the same pixel pitch display panel. Therefore, we built a prototype that combines a vertical red, green, blue stripe display panel and a parallax barrier with subpixel width slits. We confirmed effective resolution improvement by tiny subjective tests. This method also helps increase the depth range of the display when it is observed from a normal distance. © 1972-2012 IEEE.

关键词： Three dimensional displays

DEEP QUANTIGRAPHIC IMAGE ENHANCEMENT VIA COMPARAMETRIC EQUATIONS

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Wu, Xiaomeng Sun, Yongqing Kimura, Akisato Communication Science Laboratories NTT Corporation Computer and Data Science Laboratories NTT Corporation

Most recent methods of deep image enhancement can be generally classified into two types: decompose-and-enhance and illumination estimation-centric. The former is usually less efficient, and the latter is constrained by a strong assumption regarding image reflectance as the desired enhancement result. To alleviate this constraint while retaining high efficiency, we propose a novel trainable module that diversifies the conversion from the low-light image and illumination map to the enhanced image. It formulates image enhancement as a comparametric equation parameterized by a camera response function and an exposure compensation ratio. By incorporating this module in an illumination estimation-centric DNN, our method improves the flexibility of deep image enhancement, limits the computational burden to illumination estimation, and allows for fully unsupervised learning adaptable to the diverse demands of different tasks. Copyright © 2023, The Authors. All rights reserved.

关键词： Image enhancement

META-LEARNING OF SEMI-SUPERVISED LEARNING FROM TASKS WITH HETEROGENEOUS ATTRIBUTE SPACES

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Iwata, Tomoharu Kumagai, Atsutoshi NTT Communication Science Laboratories Japan NTT Computer and Data Science Laboratories Japan

We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space, which prevents us from learning with a wide variety of tasks. With the proposed method, the expected test performance on tasks with a small amount of labeled data is improved with unlabeled data as well as data in various tasks, where the attribute spaces are different among tasks. The proposed method embeds labeled and unlabeled data simultaneously in a task-specific space using a neural network, and the unlabeled data's labels are estimated by adapting classification or regression models in the embedding space. For the neural network, we develop variable-feature self-attention layers, which enable us to find embeddings of data with different attribute spaces with a single neural network by considering interactions among examples, attributes, and labels. Our experiments on classification and regression datasets with heterogeneous attribute spaces demonstrate that our proposed method outperforms the existing meta-learning and semi-supervised learning methods. Copyright © 2023, The Authors. All rights reserved.

关键词： Embeddings