Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts...
详细信息
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable *** predictivemodels for thyroid cancer enhan...
详细信息
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable *** predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce ***,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and *** paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present *** study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction *** the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the *** original dataset is used in trainingmachine learning models,and further used in generating SHAP values *** the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based *** new integrated dataset is used in re-training the machine learning *** new SHAP values generated from these models help in validating the contributions of feature sets in predicting *** conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making *** this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the *** study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of *** proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area un
Rice is a major crop and staple food for more than half of the world’s population and plays a vital role in ensuring food security as well as the global economy pests and diseases pose a threat to the production of r...
详细信息
Rice is a major crop and staple food for more than half of the world’s population and plays a vital role in ensuring food security as well as the global economy pests and diseases pose a threat to the production of rice and have a substantial impact on the yield and quality of the crop. In recent times, deep learning methods have gained prominence in predicting rice leaf diseases. Despite the increasing use of these methods, there are notable limitations in existing approaches. These include a scarcity of extensive and diverse collections of leaf disease images, lower accuracy rates, higher time complexity, and challenges in real-time leaf disease detection. To address the limitations, we explicitly investigate various data augmentation approaches using different generative adversarial networks (GANs) for rice leaf disease detection. Along with the GAN model, advanced CNN-based classifiers have been applied to classify the images with improving data augmentation. Our approach involves employing various GANs to generate high-quality synthetic images. This strategy aims to tackle the challenges posed by limited and imbalanced datasets in the identification of leaf diseases. The key benefit of incorporating GANs in leaf disease detection lies in their ability to create synthetic images, effectively augmenting the dataset’s size, enhancing diversity, and reducing the risk of overfitting. For dataset augmentation, we used three distinct GAN architectures—namely simple GAN, CycleGAN, and DCGAN. Our experiments demonstrated that models utilizing the GAN-augmented dataset generally outperformed those relying on the non-augmented dataset. Notably, the CycleGAN architecture exhibited the most favorable outcomes, with the MobileNet model achieving an accuracy of 98.54%. These findings underscore the significant potential of GAN models in improving the performance of detection models for rice leaf diseases, suggesting their promising role in the future research within this doma
Customized keyword spotting needs to adapt quickly to small user *** methods primarily solve the problem under moderate noise *** work increases the level of difficulty in detecting keywords by introducing keyword ***...
详细信息
Customized keyword spotting needs to adapt quickly to small user *** methods primarily solve the problem under moderate noise *** work increases the level of difficulty in detecting keywords by introducing keyword ***,the current solution has been explored on large models with many parameters,making it unsuitable for deployment on small *** applying the current solution to lightweight models with minimal training data,the performance degrades compared to the baseline ***,we propose a light-weight multi-task architecture(<9.0×10^(4)parameters)created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the *** results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting,with accuracy gains ranging from 0.73%to 2.95%for a clean set and from 2.01%to 3.37%for a mixed set under different scales of training ***,our model shows its robustness in different low-resource language datasets while converging faster.
Advancements in Natural Language Processing and Deep Learning techniques have significantly pro-pelled the automation of Legal Judgment Prediction,achieving remarkable progress in legal *** of the existing research wo...
详细信息
Advancements in Natural Language Processing and Deep Learning techniques have significantly pro-pelled the automation of Legal Judgment Prediction,achieving remarkable progress in legal *** of the existing research works on Legal Judgment Prediction(LJP)use traditional optimization algorithms in deep learning techniques falling into local *** research article focuses on using the modified Pelican Optimization method which mimics the collective behavior of Pelicans in the exploration and exploitation phase during cooperative food ***,the selection of search agents within a boundary is done randomly,which increases the time required to achieve global *** address this,the proposed Chaotic Opposition Learning-based Pelican Optimization(COLPO)method incorporates the concept of Opposition-Based Learning combined with a chaotic cubic function,enabling deterministic selection of random numbers and reducing the number of iterations needed to reach global ***,the LJP approach in this work uses improved semantic similarity and entropy features to train a hybrid classifier combining Bi-GRU and Deep *** output scores are fused using improved score level fusion to boost prediction *** proposed COLPO method experiments with real-time Madras High Court criminal cases(Dataset 1)and the Supreme Court of India database(Dataset 2),and its performance is compared with nature-inspired algorithms such as Sparrow Search Algorithm(SSA),COOT,Spider Monkey Optimization(SMO),Pelican Optimization Algorithm(POA),as well as baseline classifier models and transformer neural *** results show that the proposed hybrid classifier with COLPO outperforms other cutting-edge LJP algorithms achieving 93.4%and 94.24%accuracy,respectively.
Current motion detection and evaluation technologies face challenges such as limited scalability, imprecise feedback, and lack of personalized guidance. To address these challenges, this research integrated efficient ...
详细信息
Databases play a vital role in data management in many fields,such as finance,government,telecommunications,energy,electricity,transportation,*** the database management system has become a core foundational *** is an...
详细信息
Databases play a vital role in data management in many fields,such as finance,government,telecommunications,energy,electricity,transportation,*** the database management system has become a core foundational *** is an enterprise-grade open-source database,a product of deep integration of research and development from Huawei,Tsinghua University,and China Mobile in the past decade.
Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transforma...
详细信息
Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transformations, such as data alignments or generating multiple views, which are prone to data sensitivity, error propagation, and prohibitive computational requirements. We improve the estimation accuracy by mitigating the impact of self-occlusion and viewpoint variations from the network side and propose MH-Net, a novel multiheaded network for accurate 3D hand pose estimation from a depth image. MH-Net comprises three key components. First, a multiscale feature extraction backbone based on an improved multiscale vision transformer (MViTv2) is proposed to extract shift-invariant global features. Second, a 3D anchorset generator is proposed to generate three disjoint sets of 3D anchors that serve two purposes: formulating hand pose estimation as an anchor-to-joint offset estimation and defining three unique viewpoints from a single depth image. Third, three identical regression heads are proposed to regress 3D joint positions based on unique viewpoints defined by their respective anchorsets. Extensive ablation studies have been conducted to investigate the impact of anchorsets, regression heads, and feature extraction backbones. Experiments on three public datasets, ICVL, MSRA, and NYU, show significant improvements over the state-of-the-art. IEEE
In this paper,a robust and consistent COVID-19 emergency decision-making approach is proposed based on q-rung linear diophantine fuzzy set(q-RLDFS),differential evolutionary(DE)optimization principles,and evidential r...
详细信息
In this paper,a robust and consistent COVID-19 emergency decision-making approach is proposed based on q-rung linear diophantine fuzzy set(q-RLDFS),differential evolutionary(DE)optimization principles,and evidential reasoning(ER)*** proposed approach uses q-RLDFS in order to represent the evaluating values of the alternatives corresponding to the *** optimization is used to obtain the optimal weights of the attributes,and ER methodology is used to compute the aggregated q-rung linear diophantine fuzzy values(q-RLDFVs)of each *** the score values of alternatives are computed based on the aggregated *** alternative with the maximum score value is selected as a better *** applicability of the proposed approach has been illustrated in COVID-19 emergency decision-making system and sustainable energy planning ***,we have validated the proposed approach with a numerical ***,a comparative study is provided with the existing models,where the proposed approach is found to be robust to perform better and consistent in uncertain environments.
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Pa...
详细信息
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Patterns(LDP),which has good characteristics for emotion detection has yielded encouraging *** innova-tive end-to-end learnable High Response-based Local Directional Pattern(HR-LDP)network for facial emotion recognition is implemented by employing fixed convolutional filters in the proposed *** combining learnable convolutional layers with fixed-parameter HR-LDP layers made up of eight Kirsch filters and derivable simulated gate functions,this network considerably minimizes the number of network *** cost of the parameters in our fully linked layers is up to 64 times lesser than those in currently used deep learning-based detection *** seven well-known databases,including JAFFE,CK+,MMI,SFEW,OULU-CASIA and MUG,the recognition rates for seven-class facial expression recognition are 99.36%,99.2%,97.8%,60.4%,91.1%and 90.1%,*** results demonstrate the advantage of the proposed work over cutting-edge techniques.
暂无评论