As a Turing test in multimedia,visual question answering(VQA)aims to answer the textual question with a given ***,the“dynamic”property of neural networks has been explored as one of the most promising ways of improv...
详细信息
As a Turing test in multimedia,visual question answering(VQA)aims to answer the textual question with a given ***,the“dynamic”property of neural networks has been explored as one of the most promising ways of improving the adaptability,interpretability,and capacity of the neural network ***,despite the prevalence of dynamic convolutional neural networks,it is relatively less touched and very nontrivial to exploit dynamics in the transformers of the VQA tasks through all the stages in an end-to-end ***,due to the large computation cost of transformers,researchers are inclined to only apply transformers on the extracted high-level visual features for downstream vision and language *** this end,we introduce a question-guided dynamic layer to the transformer as it can effectively increase the model capacity and require fewer transformer layers for the VQA *** particular,we name the dynamics in the Transformer as Conditional Multi-Head Self-Attention block(cMHSA).Furthermore,our questionguided cMHSA is compatible with conditional ResNeXt block(cResNeXt).Thus a novel model mixture of conditional gating blocks(McG)is proposed for VQA,which keeps the best of the Transformer,convolutional neural network(CNN),and dynamic *** pure conditional gating CNN model and the conditional gating Transformer model can be viewed as special examples of *** quantitatively and qualitatively evaluate McG on the CLEVR and VQA-Abstract *** experiments show that McG has achieved the state-of-the-art performance on these benchmark datasets.
Large volumes of end-user-generated textual data are assembled every day which leads to the evolution of social media in the form of reviews/feedback, and brief description messages. As a consequence, end-user often s...
详细信息
Heart failure is one of the primary causes for deaths caused in the hospital. Predicting mortality rate of such patients is extremely important for the efficient use of health care resources. This research aims to est...
详细信息
Classifying a person's emotional states is done using facial emotion recognition. The goal is to classify each face image into one of the 7 types of facial emotions: fear, disgust, surprise, sadness, neutral, happ...
详细信息
This study explores a feature-engineering approach for classifying skin lesions as benign or malignant. Many other approaches regarding feature extraction can be applied: color, texture, shape, Gabor filters, Histogra...
详细信息
Polycystic Ovary Syndrome (PCOS) is a widespread endocrine disorder impacting women globally. This research aims to early predict and detect PCOS which is needed to reduce long-term complications. Since it is consider...
详细信息
One critical aspect of financial markets is understanding investor sentiment to facilitate effective decision-making. This study integrates traditional sentiment analysis methods, such as the Loughran-McDonald (LM) di...
详细信息
Nonnegative Matrix Factorization(NMF)is one of the most popular feature learning technologies in the field of machine learning and pattern *** has been widely used and studied in the multi-view clustering tasks becaus...
详细信息
Nonnegative Matrix Factorization(NMF)is one of the most popular feature learning technologies in the field of machine learning and pattern *** has been widely used and studied in the multi-view clustering tasks because of its *** study proposes a general semi-supervised multi-view nonnegative matrix factorization *** algorithm incorporates discriminative and geometric information on data to learn a better-fused representation,and adopts a feature normalizing strategy to align the different *** specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework:Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization(GDCMVNMF)and Extended Multi-View Constrained Nonnegative Matrix Factorization(ExMVCNMF).The intrinsic connection between these two specific implementations is discussed,and the optimization based on multiply update rules is *** on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.
Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on man...
详细信息
Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on manual labeling effort, learning with weak video-level supervision becomes a potential solution. In this paper, we propose a novel weakly supervised framework to recognize actions and locate the corresponding frames in untrimmed videos simultaneously. Considering that there are abundant trimmed videos publicly available and well-segmented with semantic descriptions, the instructive knowledge learned on trimmed videos can be fully leveraged to analyze untrimmed videos. We present an effective knowledge transfer strategy based on inter-class semantic relevance. We also take advantage of the self-attention mechanism to obtain a compact video representation, such that the influence of background frames can be effectively eliminated. A learning architecture is designed with twin networks for trimmed and untrimmed videos, to facilitate transferable self-attentive representation learning. Extensive experiments are conducted on three untrimmed benchmark datasets (i.e., THUMOS14, ActivityNet1.3, and MEXaction2), and the experimental results clearly corroborate the efficacy of our method. It is especially encouraging to see that the proposed weakly supervised method even achieves comparable results to some fully supervised methods.
Using a variety of machine learning techniques, this research study suggests a unique method for classifying diseases using symptom-based analysis. To improve model transparency and comprehension, the study makes use ...
详细信息
暂无评论