Forcing the answer of the Question Answering (QA) task to be a single text span might be restrictive since the answer can be multiple spans in the context. Moreover, we found that multi-span answers often appear with ...
详细信息
Cross-emotion anomaly detection is an emerging and challenging research topic in cognitive analysis field, which aims at identifying the abnormal emotion pair whose semantic patterns are inconsistent across different ...
详细信息
RFID technology offers an affordable and user-friendly solution for contactless identification of objects and individuals. However, the widespread adoption of RFID systems raises concerns regarding security and privac...
详细信息
The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effe...
详细信息
The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effective way, but which, inevitably induces the Partially Mismatched Pairs (PMPs) and therefore significantly degrades the retrieval performance without particular treatment. Previous efforts often utilize the pair-wise similarity to filter out the mismatched pairs, and such operation is highly sensitive to mismatched or ambiguous data and thus leads to sub-optimal performance. To alleviate these concerns, we propose an efficient approach, termed UCPM, i.e., Uncertainty-guided Cross-modal retrieval with Partially Mismatched pairs, which can significantly reduce the adverse impact of mismatched data pairs. Specifically, a novel Uncertainty Guided Division (UGD) strategy is sophisticatedly designed to divide the corrupted training data into confident matched (clean), easily-identifiable mismatched (noisy) and hardly-determined hard subsets, and the derived uncertainty can simultaneously guide the informative pair learning while reducing the negative impact of potential mismatched pairs. Meanwhile, an effective Uncertainty Self-Correction (USC) mechanism is concurrently presented to accurately identify and rectify the fluctuated uncertainty during the training process, which further improves the stability and reliability of the estimated uncertainty. Besides, a Trusted Margin Loss (TML) is newly designed to enhance the discriminability between those hard pairs, by dynamically adjusting their soft margins to amplify the positive contributions of matched pairs while suppressing the negative impacts of mismatched pairs. Extensive experiments on three widely-used benchmark datasets, verify the effectiveness and reliability of UCPM compared with the existing SOTA approaches, and significantly improve the robustness in both synthetic and real-world PMPs. The code i
Intelligently assessing the quality of athletic performances in sports scenarios remains a fascinating challenge in computer vision. However, unraveling the subtle distinctions between two similar actions in videos an...
Intelligently assessing the quality of athletic performances in sports scenarios remains a fascinating challenge in computer vision. However, unraveling the subtle distinctions between two similar actions in videos and mapping those video representations to quality scores remain significant obstacles. To address these challenges, this work redefines the paradigm of quality score estimation from traditional relative quality score prediction to relative referee score prediction. To make this shift, a cross-feature fusion module rooted in Transformer-based video representation is introduced, to improve pairwise video feature learning in the realm of action quality assessment. Then, a novel contrastive action parsing decoder module generates mid-level representations to effectively connect visual features with detailed quality scores. Both modules utilize cross-attention mechanisms; the former refines the pairwise video features to represent the differences between video pairs, while the latter updates the input queries corresponding to each referee’s evaluation. Finally, to achieve precise quality score estimation, we introduce a meticulous coarse-to-fine decision process, integrating a score classifier and offset regressor. After validation on challenging diving datasets, including MTL-AQA, FineDiving, and TASD-2, the experimental results show that the proposed approach demonstrates effectiveness and feasibility when compared with state-of-the-art methods.
This paper presents a novel computational optimization of the deceived non local means filter using moving average and symmetric weighting. The proposed optimization is compared with different approaches that reduce t...
详细信息
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy vid...
详细信息
ISBN:
(纸本)9781538695562;9781538660843
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy videos is a challenging task due to variations in the size and shape of polyps. In this paper, we propose a polyp segmentation method based on the encoder-decoder network. Performance of the method is enhanced by two strategies, we perform a novel database augmentation method for colonoscopy images in the training phase. Besides, in the test phase, we perform an effective prediction by combining multi-model to compare the probability of each image that is produced by the network. Evaluation of the proposed method using the ETIS-LariPolypDB database shows that our proposed method outperforms state-of-the-art results.
The rapid increase in Mobile Internet of Things (IoT) devices requires novel computational frameworks. These frameworks must meet strict latency and energy efficiency requirements in Edge and Mobile Edge Computing (ME...
详细信息
Localizing heavily occluded human faces is a challenging problem in facial detection. Previous methods mainly employ sliding windows by determining whether windows include human faces. In this paper, we provide a nove...
详细信息
ISBN:
(纸本)9781467399623
Localizing heavily occluded human faces is a challenging problem in facial detection. Previous methods mainly employ sliding windows by determining whether windows include human faces. In this paper, we provide a novel segmentation-based perspective for heavily occluded face localization with deep convolutional neural networks (CNN). Our model takes an image as input without complicated pre-processing. After several convolutional layers, fully-connected layers and a softmax classifier, we can predict the labels of all pixels in an image, which is the key to localize heavily occluded human faces. Finally, we search a minimal rectangle to localize the human face. Our detector needs neither complex pre-processing nor the time-consuming sliding window. Besides, we use a single model to localize faces to further alleviate computational complexity. Experimental results show that our proposed method is a very effective way to localize heavily occluded human face.
We present an anatomically guided feature selection scheme for prediction of neurological disorders based on brain connectivity networks. Using anatomical information not only gives rise to an interpretable model, but...
详细信息
暂无评论