Exploring an efficient and scalable architecture of fault-tolerant quantum computing (FTQC) is vital for demonstrating useful quantum computing. Here, we propose and evaluate a scalable and practical architecture with...
详细信息
This study proposes introducing facial-expression synchrony features to machine learning to estimate a customer’s psychological information from online business negotiation dialogue data. It is important for synchron...
This study proposes introducing facial-expression synchrony features to machine learning to estimate a customer’s psychological information from online business negotiation dialogue data. It is important for synchrony features to model the information on who led the synchrony and who followed it, the lead-lag structure, because the psychology of the leader and follower can differ. However, conventional synchrony models cannot incorporate such lead-lag structure information because they are based on the assumption that synchrony involves the co-occurrence of features in the same frame. To solve this problem, we propose using synchrony features extracted on the basis of windowed time-lagged cross-correlation, which cuts out a short segment from each of the input sequences and computes the cross-correlation between the segments. Since this method measures the similarity of signals across different frames, it is suitable for modeling the lead-lag structure. We conducted experiments based on an audio visual corpus of business negotiation dialogue assessed with various psychological measurements. The results indicate that considering lead-lag information can improve the accuracy in estimating psychological information.
We propose a method for next-speaker prediction, a task to predict who speaks in the next turn among multiple current listeners, in multi-party video conversation. Previous studies used non-verbal features, such as he...
详细信息
We propose a method for next-speaker prediction, a task to predict who speaks in the next turn among multiple current listeners, in multi-party video conversation. Previous studies used non-verbal features, such as head movements and gaze behavior, for next-speaker prediction in face-to-face conversation. However, in video conversation, these non-verbal features are vague and ineffective because they look at the screen displaying other participants. Since non-verbal features include participant characteristics, it is necessary to use training data with rich combinations of participants to robustly predict the next speaker. Previous studies used training data with a limited number of combinations of participants because the data consist only of recorded data. Therefore, the proposed method uses 1) novel non-verbal features for next-speaker prediction in video conversation, specifically facial expressions, hand movements and speech segments, and 2) data augmentation of participant combinations in the training data. We conducted experiments to evaluate the proposed method, and the results using video-conversation data indicate its effectiveness.
In this paper, we propose a novel training method for the transformer encoder-decoder based image captioning, which directly generates a captioning text from an input image. In general, many image- to- text paired dat...
In this paper, we propose a novel training method for the transformer encoder-decoder based image captioning, which directly generates a captioning text from an input image. In general, many image- to- text paired datasets need to be prepared for robust image captioning, but such datasets cannot be collected in practical cases. Our key idea for mitigating the data preparation cost is to utilize text-to-text paraphrasing modeling, i.e., a task to convert an input text into different expressions without changing the meaning. In fact, paraphrasing deals with a similar transformation task to image captioning even though paraphrasing tasks have to handle texts instead of images. In our proposed method, an encoder-decoder network trained via the paraphrasing task is directly leveraged for image captioning. Thus, an encoder-decoder network pre-trained by a text-to-text transformation task is transferred into an image-to-text transformation task even though a different modal must be handled in the encoder network. Our experiments using the MS COCO caption datasets demonstrate the effectiveness of the proposed method.
Denoising is one of the most fundamental and important problems in signal processing, and graph signal denoising methods have been actively studied. Several graph signal denoising methods based on mathematical program...
Denoising is one of the most fundamental and important problems in signal processing, and graph signal denoising methods have been actively studied. Several graph signal denoising methods based on mathematical programming require solving linear equations involving Laplacian matrix, which creates problem with computational accuracy and running time. This study proposes a fast and accurate solution of linear equations for denoising based on the fast graph Fourier transform method. Moreover, the proposed method can perform denoising not only on graphs for which the fast graph Fourier transform can be performed, but also on a wide class of graphs with more relaxed conditions, without loss of accuracy. Experiments demonstrate the efficiency of the proposed method and confirm that denoising can be performed up to 167.3 times faster without loss of accuracy.
Model merging is attracting attention as a novel method for creating a new model by combining the weights of different trained models. While previous studies reported that model merging works well for models trained o...
详细信息
Although Gaussian processes (GPs) with deep kernels have been succesfully used for meta-learning in regression tasks, its uncertainty estimation performance can be poor. We propose a meta-learning method for calibrati...
详细信息
Highly realistic 3-D displays that can reproduce object images to look like physical objects are utilized for natural and correct remote operation in industrial scenes. Therefore, we developed a visually equivalent li...
详细信息
Most recent methods of deep image enhancement can be generally classified into two types: decompose-and-enhance and illumination estimation-centric. The former is usually less efficient, and the latter is constrained ...
详细信息
We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the sa...
详细信息
暂无评论