A number of machine learning (ML) approaches for drug discovery have been available that rely only on sequential (1D) and planar (2D) information without effectively using the 3D information for generating features of...
详细信息
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution (346 × 260) is low for practical applications. Actually, only visible cameras are deployed in many practical systems, a...
详细信息
This paper investigates the existence, enumeration and asymptotic performance of self-dual and LCD double circulant codes over Galois rings of characteristic p2 and order p4 with p and odd prime. When p ≡ 3 (mod 4), ...
详细信息
Eyes location is an essential part of human computer interaction. The precise of eyes location decides the feasibility of interaction. In this paper, we expand the haar features by introducing a new type of characteri...
详细信息
Knowledge tracing has been widely used in online learning systems to guide the students' future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets ...
详细信息
Converting whisper to normal vocalized speech has been a hot research topic in speech signalprocessing area. A complete and large scale whisper database is a major basis for this task. In this paper, we propose a mul...
Converting whisper to normal vocalized speech has been a hot research topic in speech signalprocessing area. A complete and large scale whisper database is a major basis for this task. In this paper, we propose a multimodal whisper database in Chinese mandarin. A total of 103 syllables and 100 sentences were carefully selected. 5 male and 5 female participants pronounced the syllables and sentences in whisper and normal styles respectively, result in 4096 parallel speech utterances and 263, 849 frames of voicing face and lip image sequences. The beginning and ending sample point of each syllable were labeled both for speech signal and voicing face video. The lip region of interest were also extracted and provided in the proposed database. Experiments in various speech conversion tasks in different speech database show the effectiveness of the proposed multimodal whisper speech database.
Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative task...
详细信息
暂无评论