Hierarchical classification divides data into correlated sub-tasks from coarse to fine. Compared to flat classification, it is more complex and suffers from the curse of dimensionality. Existing hierarchical Fuzzy Rou...
详细信息
Current weakly supervised point cloud semantic segmentation struggles with insufficient utilization of limited annotations in unimodal representation learning due to the sparse and textureless nature of point clouds. ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Current weakly supervised point cloud semantic segmentation struggles with insufficient utilization of limited annotations in unimodal representation learning due to the sparse and textureless nature of point clouds. In this work, we leverage cross-modality information by transferring knowledge from image and text sources to the point cloud network. The intuition is that images contribute rich texture, color, and discriminative information, complementing point clouds to boost semantic segmentation performance. To reduce extensive computational resources for cross-modality fusion, we introduce the Multi-Scale Deformable Knowledge Transfer, an innovative training scheme that optimizes and extends the one-to-one mapping to flexible one-to-many relations between multi-modal data. Furthermore, we employ pre-trained image-text models to generate pseudo labels for point clouds and construct positive and negative samples for semantic contrastive regularization, facilitating the full exploitation of unlabeled data. The experimental results evaluated on SemanticKITTI and nuScenes demonstrate substantial improvements, achieving an average gain of 3.8% over the previous weakly supervised methods, and comparable performances to fully supervised approaches.
Genetic programming hyperheuristic (GPHH) has recently become a promising methodology for large-scale dynamic path planning (LDPP) since it can produce reusable heuristics rather than disposable solutions. However, in...
详细信息
Few-shot graph learning tackles the challenge of categorization with limited samples by utilizing the relational information encoded on the graph. Recent studies have acquired considerable success in directing query n...
详细信息
Recently, audio generation tasks have attracted considerable research interests. Despite rapid advancements in generating high-fidelity audio that is coarsely aligned with the text description, precise temporal contro...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recently, audio generation tasks have attracted considerable research interests. Despite rapid advancements in generating high-fidelity audio that is coarsely aligned with the text description, precise temporal controllability is still a challenge, which is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. It leverages data crawling, segmentation and filtering to simulate fine-grained temporally-aligned audio-text data. Furthermore, PicoAudio integrates temporal information to guide audio generation through tailored model design. With the effective text processing capabilities from large language models, PicoAudio can take natural language input and generate audio that aligns well with the temporal description in the input. Both subjective and objective evaluation demonstrate that PicoAudio dramatically surpasses current state-of-the-art generation models in terms of timestamp and occurrence frequency controllability. Generation samples are available at the $PicoAudio - Demo$.
Recent advances in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relation, a critical feature for audio content, is currently underrepre...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recent advances in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relation, a critical feature for audio content, is currently underrepresented in mainstream models, resulting in an imprecise temporal controllability. Specifically, users cannot accurately control the timestamps of sound events using free-form text. One significant challenge is the absence of a high-quality, temporally-aligned audio-text dataset, which is essential for training models with temporal control. The more temporally-aligned the annotations, the better the models can understand the precise relationship between audio outputs and temporal textual prompts. Therefore, we propose a temporally-aligned audio-text dataset, AudioTime. It provides text annotations rich in temporal information such as timestamps, duration, frequency, and ordering, covering almost all aspects of temporal control. Additionally, we offer a comprehensive test set and evaluation metric to assess the temporal control performance of text-to-audio generation models. Examples are available on the $AudioTime - Demo$.
In reality, the laborious nature of label annotation leads to the widespread existence of limited labeled data. Moreover, multi-scale data have received widespread attention due to its rich knowledge representation. H...
详细信息
Multi-organ segmentation in the abdomen is a key area in medical image segmentation and is essential for accurate diagnosis and treatment of diseases. The diversity of abdominal organs, differences in size, and ambigu...
详细信息
In a formal data analysis workflow, data validation is a necessary step that helps data analysts verify the quality of the data and ensure the reliability of the results. data analysts usually need to validate the res...
详细信息
data sharing schemes based on the Internet of Medical Things (IoMT) have emerged as a more convenient way to monitor and manage individuals’ health. However, this scenario faces challenges such as privacy preservatio...
详细信息
暂无评论