Co-saliency detection within a single image is a common vision problem that has not yet been well addressed. Existing methods often used a bottom-up strategy to infer co-saliency in an image in which salient regions a...
详细信息
Co-saliency detection within a single image is a common vision problem that has not yet been well addressed. Existing methods often used a bottom-up strategy to infer co-saliency in an image in which salient regions are firstly detected using visual primitives such as color and shape and then grouped and merged into a co-saliency map. However, co-saliency is intrinsically perceived complexly with bottom-up and top-down strategies combined in human vision. To address this problem, this study proposes a novel end-toend trainable network comprising a backbone net and two branch nets. The backbone net uses ground-truth masks as top-down guidance for saliency prediction, whereas the two branch nets construct triplet proposals for regional feature mapping and clustering, which drives the network to be bottom-up sensitive to co-salient regions. We construct a new dataset of 2019 natural images with co-saliency in each image to evaluate the proposed method. Experimental results show that the proposed method achieves state-of-the-art accuracy with a running speed of 28 fps.
Monitoring respiration is an important component of personal health *** recent developments in Wi-Fi sensing offer a potential tool to achieve contact-free respiration monitoring,existing proposals for Wi-Fi-based mul...
详细信息
Monitoring respiration is an important component of personal health *** recent developments in Wi-Fi sensing offer a potential tool to achieve contact-free respiration monitoring,existing proposals for Wi-Fi-based mul-ti-person respiration sensing mainly extract individual's respiration rate in the frequency domain using the fast Fourier transform(FFT)or multiple signal classification(MUSIC)method,leading to the following limitations:1)largely ineffec-tive in recovering breaths of multiple persons from received mixed signals and in differentiating individual breaths,2)un-able to acquire the time-varying respiration pattern when the subject has respiratory abnormity,such as apnea and chang-ing respiration rates,and 3)difficult to identify the real number of subjects when multiple subjects share the same or simi-lar respiration *** address these issues,we propose Wi-Fi-enabled MUlti-person SEnsing(WiMUSE)as a signal pro-cessing pipeline to perform respiration monitoring for multiple persons ***,as a pioneering time domain approach,WiMUSE models the mixed signals of multi-person respiration as a linear superposition of multiple waveforms,so as to form a blind source separation(BSS)*** effective separation of the signal sources(respira-tory waveforms)further enables us to quantify the differences in the respiratory waveform patterns of multiple subjects,and thus to identify the number of subjects along with their respective respiration *** implement WiMUSE on commodity Wi-Fi devices and conduct extensive experiments to demonstrate that,compared with the approaches based on the FFT or MUSIC method,90%error of respiration rate can be reduced by more than 60%.
With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application ***,the issue of classifying data streams based on mobile sensors can be formalized as a multi-...
详细信息
With the enhancement of data collection capabilities,massive streaming data have been accumulated in numerous application ***,the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple *** incremental learning methods are often single-task single-view,which cannot learn shared representations between relevant tasks and *** adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges,utilizing the idea of multi-task multi-view ***,the attention mechanism is first used to align different sensor data of different *** addition,MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental *** reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previo...
详细信息
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the “diverse ensemble” that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail *** second is the “error correction” that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the “diverse ensemble” to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble(RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation ***, RDE converges in less than 30 training epochs without increasing the computational overhead.
Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data....
详细信息
Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data. These methods aretypically based on learning with contrastive losses whileautomatically deriving per-point pseudo-labels from asparse set of user-annotated labels. In this paper, ourkey observation is that the selection of which samplesto annotate is as important as how these samplesare used for training. Thus, we introduce a methodfor weakly supervised segmentation of 3D scenes thatcombines self-training with active learning. Activelearning selects points for annotation that are likelyto result in improvements to the trained model, whileself-training makes efficient use of the user-providedlabels for learning the model. We demonstrate thatour approach leads to an effective method that providesimprovements in scene segmentation over previouswork and baselines, while requiring only a few userannotations.
Brain tumor classification is crucial for personalized treatment *** deep learning-based Artificial Intelligence(AI)models can automatically analyze tumor images,fine details of small tumor regions may be overlooked d...
详细信息
Brain tumor classification is crucial for personalized treatment *** deep learning-based Artificial Intelligence(AI)models can automatically analyze tumor images,fine details of small tumor regions may be overlooked during global feature ***,we propose a brain tumor Magnetic Resonance Imaging(MRI)classification model based on a global-local parallel dual-branch *** global branch employs ResNet50 with a Multi-Head Self-Attention(MHSA)to capture global contextual information from whole brain images,while the local branch utilizes VGG16 to extract fine-grained features from segmented brain tumor *** features from both branches are processed through designed attention-enhanced feature fusion module to filter and integrate important ***,to address sample imbalance in the dataset,we introduce a category attention block to improve the recognition of minority *** results indicate that our method achieved a classification accuracy of 98.04%and a micro-average Area Under the Curve(AUC)of 0.989 in the classification of three types of brain tumors,surpassing several existing pre-trained Convolutional Neural Network(CNN)***,feature interpretability analysis validated the effectiveness of the proposed *** suggests that the method holds significant potential for brain tumor image classification.
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of ***,there is a large performance gap between weakly supervised and fully supervised salient o...
详细信息
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of ***,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background ***,an intuitive idea is to infer annotations that cover more complete object and background regions for *** this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent ***,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster ***,the same annotations for pixels with similar colours within each kernel neighbourhood was set *** experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results.
Prior research in video object segmentation (VOS) predominantly relies on videos with dense annotations. However, obtaining pixel-level annotations is both costly and time-intensive. In this work, we highlight the pot...
详细信息
Digital twinning enables manufacturers to create digital representations of physical entities,thus implementing virtual simulations for product *** efforts of digital twinning neglect the decisive consumer feedback in...
详细信息
Digital twinning enables manufacturers to create digital representations of physical entities,thus implementing virtual simulations for product *** efforts of digital twinning neglect the decisive consumer feedback in product development stages,failing to cover the gap between physical and digital *** work mines real-world consumer feedbacks through social media topics,which is significant to product *** specifically analyze the prevalent time of a product topic,giving an insight into both consumer attention and the widely-discussed time of a *** primary body of current studies regards the prevalent time prediction as an accompanying task or assumes the existence of a preset ***,these proposed solutions are either biased in focused objectives and underlying patterns or weak in the capability of generalization towards diverse *** this end,this work combines deep learning and survival analysis to predict the prevalent time of *** propose a specialized deep survival model which consists of two *** first module enriches input covariates by incorporating latent features of the time-varying text,and the second module fully captures the temporal pattern of a rumor by a recurrent network ***,a specific loss function different from regular survival models is proposed to achieve a more reasonable *** experiments on real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods.
Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion...
详细信息
Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion recognition approaches often struggle in few-shot cross-domain scenarios due to their limited capacity to generalize semantic features across different domains. Additionally, these methods face challenges in accurately capturing complex emotional states, particularly those that are subtle or implicit. To overcome these limitations, we introduce a novel approach called Dual-Task Contrastive Meta-Learning (DTCML). This method combines meta-learning and contrastive learning to improve emotion recognition. Meta-learning enhances the model’s ability to generalize to new emotional tasks, while instance contrastive learning further refines the model by distinguishing unique features within each category, enabling it to better differentiate complex emotional expressions. Prototype contrastive learning, in turn, helps the model address the semantic complexity of emotions across different domains, enabling the model to learn fine-grained emotions expression. By leveraging dual tasks, DTCML learns from two domains simultaneously, the model is encouraged to learn more diverse and generalizable emotions features, thereby improving its cross-domain adaptability and robustness, and enhancing its generalization ability. We evaluated the performance of DTCML across four cross-domain settings, and the results show that our method outperforms the best baseline by 5.88%, 12.04%, 8.49%, and 8.40% in terms of accuracy.
暂无评论