Accurate segmentation of cellular nuclei is imperative for various biological and medical applications, such as cancer diagnosis and drug discovery. Histopathology, a discipline employing microscopic examination of bo...
详细信息
1 Introduction Endoscopy plays a crucial role in the diagnoses and treatment of gastrointestinal(GI)diseases[1],as it helps to identify abnormalities,classify lesion,and determine treatment *** GI endoscopic examinati...
详细信息
1 Introduction Endoscopy plays a crucial role in the diagnoses and treatment of gastrointestinal(GI)diseases[1],as it helps to identify abnormalities,classify lesion,and determine treatment *** GI endoscopic examinations,physicians may encounter practical hindrances,i.e.,fatigue,stress,or limited experience,which can lead to erroneous *** intelligence(AI)-assisted GI endoscopy technology has emerged to address these limitations[2].
Cross-emotion anomaly detection is an emerging and challenging research topic in cognitive analysis field, which aims at identifying the abnormal emotion pair whose semantic patterns are inconsistent across different ...
详细信息
The application of handwriting analysis in the health field for early detection and diagnosis is limited by a lack of data, which presents a significant challenge for the implementation of deep learning-based models. ...
详细信息
Accurate assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This s...
Despite the remarkable performance of vision language models (VLMs) such as Contrastive Language Image Pre-training (CLIP), the large size of these models is a considerable obstacle to their use in federated learning ...
详细信息
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Su...
详细信息
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.
In the specialized domain of brain tumor segmentation, supervised segmentation approaches are hindered by the limited availability of high-quality labeled data, a condition arising from data privacy concerns, signific...
详细信息
This paper deals with the problem of extracting information regarding the chemical composition of stones in the human gallbladder from in vitro and in vivo B-scan ultrasonic images. The images are subjected to the Her...
详细信息
This paper deals with the problem of extracting information regarding the chemical composition of stones in the human gallbladder from in vitro and in vivo B-scan ultrasonic images. The images are subjected to the Hermite pyramid decomposition technique described in Part I (Venkatesh, Y. V., Ultrasonic Imaging, 18, 261-301, 1996). In an attempt to determine the chemical composition of the gallstones, the gradients of the decomposed images are input to an unsupervised classifier. The outputs of the classifier exhibit some interesting patterns that appear to be related to the chemical composition of the gallstones contained in these images. (C) 1996 Academic Press.
暂无评论