Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts...
详细信息
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.
Model performance has been significantly enhanced by channel attention. The average pooling procedure creates skewness, lowering the performance of the network architecture. In the channel attention approach, average ...
详细信息
PM2.5 has a non-negligible impact on visibility and air quality as an important component of haze and can affect cloud formation and rainfall and thus change the climate,and it is an evaluation indicator of air pollut...
详细信息
PM2.5 has a non-negligible impact on visibility and air quality as an important component of haze and can affect cloud formation and rainfall and thus change the climate,and it is an evaluation indicator of air pollution *** PM2.5 concentration prediction based on relevant historical data mining can effectively improve air pollution forecasting ability and guide air pollution prevention and *** past methods neglected the impact caused by PM2.5 flow between cities when analyzing the impact of inter-city PM2.5 concentrations,making it difficult to further improve the prediction ***,factors including geographical information such as altitude and distance and meteorological information such as wind speed and wind direction affect the flow of PM2.5 between cities,leading to the change of PM2.5 concentration in *** a PM2.5 directed flow graph is constructed in this *** and meteorological data is introduced into the graph structure to simulate the spatial PM2.5 flow transmission relationship between *** introduction of meteorological factors like wind direction depicts the unequal flow relationship of PM2.5 between *** on this,a PM2.5 concentration prediction method integrating spatial-temporal factors is proposed in this paper.A spatial feature extraction method based on weight aggregation graph attention network(WGAT)is proposed to extract the spatial correlation features of PM2.5 in the flow graph,and a multi-step PM2.5 prediction method based on attention gate control loop unit(AGRU)is *** PM2.5 concentration prediction model WGAT-AGRU with fused spatiotemporal features is constructed by combining the two methods to achieve multi-step PM2.5 concentration ***,accuracy and validity experiments are conducted on the KnowAir dataset,and the results show that the WGAT-AGRU model proposed in the paper has good performance in terms of prediction accuracy and validates the effectiveness
The emergence of 5G networks has enabled the deployment of a two-tier edge and vehicular-fog network. It comprises Multi-access Edge Computing (MEC) and Vehicular-Fogs (VFs), strategically positioned closer to Interne...
详细信息
Accurate significant wave height(SWH)prediction is essential for the development and utilization of wave *** learning methods such as recurrent and convolutional neural networks have achieved good results in SWH ***,t...
详细信息
Accurate significant wave height(SWH)prediction is essential for the development and utilization of wave *** learning methods such as recurrent and convolutional neural networks have achieved good results in SWH ***,these methods do not adapt well to dynamic seasonal variations in wave *** this study,we propose a novel method—the spatiotemporal dynamic graph(STDG)neural *** method predicts the SWH of multiple nodes based on dynamic graph modeling and multi-characteristic ***,considering the dynamic seasonal variations in the wave direction over time,the network models wave dynamic spatial dependencies from long-and short-term pattern ***,to correlate multiple characteristics with SWH,the network introduces a cross-characteristic transformer to effectively fuse multiple ***,we conducted experiments on two datasets from the South China Sea and East China Sea to validate the proposed method and compared it with five prediction methods in the three *** experimental results show that the proposed method achieves the best performance at all predictive scales and has greater advantages for extreme value ***,an analysis of the dynamic graph shows that the proposed method captures the seasonal variation mechanism of the waves.
Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this ***-aided d...
详细信息
Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this ***-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.
Semantic segmentation is an important sub-task for many ***,pixel-level ground-truth labeling is costly,and there is a tendency to overfit to training data,thereby limiting the generalization *** domain adaptation can...
详细信息
Semantic segmentation is an important sub-task for many ***,pixel-level ground-truth labeling is costly,and there is a tendency to overfit to training data,thereby limiting the generalization *** domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain(including less expensive synthetic domain)to be adapted to a novel target *** conventional approach involves automatic extraction and alignment of the representations of source and target domains *** limitation of this approach is that it tends to neglect the differences between classes:representations of certain classes can be more easily extracted and aligned between the source and target domains than others,limiting the adaptation over all ***,we address:this problem by introducing a Class-Conditional Domain Adaptation(CCDA)*** incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and ***,they measure the segmentation,shift the domain in a classconditional manner,and equalize the loss over *** results demonstrate that the performance of our CCDA method matches,and in some cases,surpasses that of state-of-the-art methods.
The current study is defined by two main aims. An effective strategy for improving local search is to combine the Set Algebra-Based Heuristic Algorithm (SAHA) algorithm with the Nelder-Mead simplex method. The approac...
详细信息
Dementia is a general term used to indicate any disorder related to human memory. The various memory-related problems severely affect the human brain and so the individual feels difficulty in doing their normal physic...
详细信息
As a result of its aggressive nature and late identification at advanced stages, lung cancer is one of the leading causes of cancer-related deaths. Lung cancer early diagnosis is a serious and difficult challenge that...
详细信息
暂无评论