Recently, emotional speech generation and speaker cloning have garnered significant interest in text-to-speech (TTS). With the open-sourcing of codec language TTS models trained on massive datasets with large-scale pa...
详细信息
Parameter-Efficient Fine-Tuning methods based on vision-language models (such as CLIP) for few-shot learning have recently received considerable attention. However, previous works only fine-tune either the image or te...
详细信息
Class incremental learning (CIL) aims to mitigate catastrophic forgetting of previously learned classes when integrating new knowledge. A primary challenge contributing to forgetting is the absence of data from earlie...
详细信息
Slot filling and intent detection are two highly correlated tasks in spoken language understanding (SLU). Recent SLU research attempts to explore zero-shot prompting techniques in large language models to alleviate th...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Slot filling and intent detection are two highly correlated tasks in spoken language understanding (SLU). Recent SLU research attempts to explore zero-shot prompting techniques in large language models to alleviate the data scarcity problem. Nevertheless, the existing prompting work ignores the cross-task interaction information for SLU, which leads to sub-optimal performance. To solve this problem, we present the pioneering work of Cross-task Interactive Prompting (CroPrompt) for SLU, which enables the model to interactively leverage the information exchange across the correlated tasks in SLU. Additionally, we further introduce a multi-task self-consistency mechanism to mitigate the error propagation caused by the intent information injection. We conduct extensive experiments on the standard SLU benchmark and the results reveal that CroPrompt consistently outperforms the existing prompting approaches. In addition, the multi-task self-consistency mechanism can effectively ease the error propagation issue, thereby enhancing the performance. We hope this work can inspire more research on cross-task prompting for SLU.
In the era where Web3.0 values data security and privacy, adopting groundbreaking methods to enhance privacy in recommender systems is crucial. Recommender systems need to balance privacy and accuracy, while also havi...
详细信息
An ideal artificial intelligence (AI) system should have the capability to continually learn like humans. However, when learning new knowledge, AI systems often suffer from catastrophic forgetting of old knowledge. Al...
详细信息
A high-quality enrollment speech is crucial to target speaker extraction (TSE), since it provides essential cues for identifying the target speaker in the mixture. However, real applications usually only permit a shor...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
A high-quality enrollment speech is crucial to target speaker extraction (TSE), since it provides essential cues for identifying the target speaker in the mixture. However, real applications usually only permit a short enrollment speech, e.g. a wakeup word for a mobile device, that provides limited cues. To address this issue, we propose an enrollment augmentation strategy that allows us to enrich the limited enrollment speech with massive text data through speech synthesis. By doing so, the extended enrollment speech contains enhanced speaker timbre and phonetic content which leads to better extraction quality. Furthermore, we propose a training data augmentation strategy to improve the model’s robustness and generalization in short enrollment speech scenarios. Experiments on Libri2Mix demonstrate that our proposed strategies bring a significant improvement in extreme scenarios where only 0.5s and 1-word enrollment speech is provided. We also release our code at https://***/HuangZikang-TJU/Aug4TSE.
We propose a novel approach for generating text-guided human-object interactions (HOIs) that achieves explicit joint-level interaction modeling in a computationally efficient manner. Previous methods represent the ent...
详细信息
Recycled and recirculated books, such as ancient texts and reused textbooks, hold significant value in the secondhand goods market, with their worth largely dependent on surface preservation. However, accurately asses...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recycled and recirculated books, such as ancient texts and reused textbooks, hold significant value in the secondhand goods market, with their worth largely dependent on surface preservation. However, accurately assessing surface defects is challenging due to the wide variations in shape, size, and the often imprecise detection of defects. To address these issues, we propose DDNet, an innovative detection model designed to enhance defect localization and classification. DDNet introduces a surface defect feature extraction module based on a deformable convolution operator (DC) and a densely connected FPN module (DFPN). The DC module dynamically adjusts the convolution grid to better align with object contours, capturing subtle shape variations and improving boundary delineation and prediction accuracy. Meanwhile, DFPN leverages dense skip connections to enhance feature fusion, constructing a hierarchical structure that generates multi-resolution, high-fidelity feature maps, thus effectively detecting defects of various sizes. In addition to the model, we present a comprehensive dataset specifically curated for surface defect detection in recycled and recirculated books. This dataset encompasses a diverse range of defect types, shapes, and sizes, making it ideal for evaluating the robustness and effectiveness of defect detection models. Through extensive evaluations, DDNet achieves precise localization and classification of surface defects, recording a mAP value of 46.7% on our proprietary dataset—an improvement of 14.2% over the baseline model—demonstrating its superior detection capabilities.
Sandwich-like structures have shown remarkable efficacy in clothed human reconstruction. However, these approaches often generate unrealistic side geometries due to inadequate handling of lateral regions. This paper a...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Sandwich-like structures have shown remarkable efficacy in clothed human reconstruction. However, these approaches often generate unrealistic side geometries due to inadequate handling of lateral regions. This paper addresses this limitation by incorporating the side geometry of clothed humans as a prior. We propose ThicknessVAE, a novel two-stage method that makes two key contributions: (1) We learn a prototype from point clouds for the lateral regions of clothed humans to extract common and detailed geometric features. (2) We utilize this prototype as a prior to transform geometric features into a thickness map associated with clothed human images, enabling refined normal integration for sandwich-like reconstruction methods. By seamlessly integrating our model into the sandwich-like reconstruction pipeline, we achieve highly realistic side views. Both qualitative and quantitative experiments demonstrate that our approach is comparable to state-of-the-art methods in terms of side-view realism.
暂无评论