检索结果-内蒙古大学图书馆

Robust video question answering via contrastive cross-modality representation learning

science China(Information sciences) 2024年第10期67卷 211-226页

作者： Xun YANG Jianming ZENG Dan GUO Shanshan WANG Jianfeng DONG Meng WANG School of Information Science and Technology University of Science and Technology of China Institute of Artificial Intelligence Hefei Comprehensive National Science Center School of Computer Science and Information Engineering Hefei University of Technology Institutes of Physical Science and Information Technology Anhui University School of Computer Science and Technology Zhejiang Gongshang University

Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.

关键词： video question answering cross-modality fusion contrastive learning cross-media reasoning

来源：评论

学校读者我要写书评

暂无评论

A Study on the Explainability of Thyroid Cancer Prediction:SHAP Values and Association-Rule Based Feature Integration Framework

引用

computers, Materials & Continua 2024年第5期79卷 3111-3138页

作者： Sujithra Sankar S.Sathyalakshmi Department of Computer Applications Hindustan Institute of Technology and ScienceChennaiTamil NaduIndia Department of Computer Engineering Hindustan Institute of Technology and ScienceChennaiTamil NaduIndia

In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable *** predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce ***,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and *** paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present *** study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction *** the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the *** original dataset is used in trainingmachine learning models,and further used in generating SHAP values *** the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based *** new integrated dataset is used in re-training the machine learning *** new SHAP values generated from these models help in validating the contributions of feature sets in predicting *** conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making *** this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the *** study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of *** proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area un

关键词： Explainable AI machine learning clinical decision support systems thyroid cancer association-rule based framework SHAP values classification and prediction

来源：评论

学校读者我要写书评

暂无评论

Image-based rice leaf disease detection using CNN and generative adversarial network

引用

Neural Computing and Applications 2025年第1期37卷 439-456页

作者： Ramadan, Syed Taha Yeasin Islam, Md Shafiqul Sakib, Tanjim Sharmin, Nusrat Rahman, Md. Mokhlesur Rahman, Md. Mahbubur Department of Computer Science and Engineering Military Institute of Science and Technology Dhaka Bangladesh

Rice is a major crop and staple food for more than half of the world’s population and plays a vital role in ensuring food security as well as the global economy pests and diseases pose a threat to the production of rice and have a substantial impact on the yield and quality of the crop. In recent times, deep learning methods have gained prominence in predicting rice leaf diseases. Despite the increasing use of these methods, there are notable limitations in existing approaches. These include a scarcity of extensive and diverse collections of leaf disease images, lower accuracy rates, higher time complexity, and challenges in real-time leaf disease detection. To address the limitations, we explicitly investigate various data augmentation approaches using different generative adversarial networks (GANs) for rice leaf disease detection. Along with the GAN model, advanced CNN-based classifiers have been applied to classify the images with improving data augmentation. Our approach involves employing various GANs to generate high-quality synthetic images. This strategy aims to tackle the challenges posed by limited and imbalanced datasets in the identification of leaf diseases. The key benefit of incorporating GANs in leaf disease detection lies in their ability to create synthetic images, effectively augmenting the dataset’s size, enhancing diversity, and reducing the risk of overfitting. For dataset augmentation, we used three distinct GAN architectures—namely simple GAN, CycleGAN, and DCGAN. Our experiments demonstrated that models utilizing the GAN-augmented dataset generally outperformed those relying on the non-augmented dataset. Notably, the CycleGAN architecture exhibited the most favorable outcomes, with the MobileNet model achieving an accuracy of 98.54%. These findings underscore the significant potential of GAN models in improving the performance of detection models for rice leaf diseases, suggesting their promising role in the future research within this doma

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting

引用

Tsinghua science and technology 2025年第2期30卷 875-893页

作者： Alexander Rogath Kivaisi Qingjie Zhao Yuanbing Zou School of Computer Science and Technology Beijing Institute of TechnologyBeijing 100081China

Customized keyword spotting needs to adapt quickly to small user *** methods primarily solve the problem under moderate noise *** work increases the level of difficulty in detecting keywords by introducing keyword ***,the current solution has been explored on large models with many parameters,making it unsuitable for deployment on small *** applying the current solution to lightweight models with minimal training data,the performance degrades compared to the baseline ***,we propose a light-weight multi-task architecture(<9.0×10^(4)parameters)created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the *** results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting,with accuracy gains ranging from 0.73%to 2.95%for a clean set and from 2.01%to 3.37%for a mixed set under different scales of training ***,our model shows its robustness in different low-resource language datasets while converging faster.

关键词： KeyWord Spotting(KWS) multi-task learning cross-dimension attention low-resource mixed speech

来源：评论

学校读者我要写书评

暂无评论

Fine Tuned Hybrid Deep Learning Model for Effective Judgment Prediction

引用

computer Modeling in Engineering & sciences 2025年第3期142卷 2925-2958页

作者： G.Sukanya J.Priyadarshini School of Computer Science and Engineering Vellore Institute of TechnologyChennai600127India

Advancements in Natural Language Processing and Deep Learning techniques have significantly pro-pelled the automation of Legal Judgment Prediction,achieving remarkable progress in legal *** of the existing research works on Legal Judgment Prediction(LJP)use traditional optimization algorithms in deep learning techniques falling into local *** research article focuses on using the modified Pelican Optimization method which mimics the collective behavior of Pelicans in the exploration and exploitation phase during cooperative food ***,the selection of search agents within a boundary is done randomly,which increases the time required to achieve global *** address this,the proposed Chaotic Opposition Learning-based Pelican Optimization(COLPO)method incorporates the concept of Opposition-Based Learning combined with a chaotic cubic function,enabling deterministic selection of random numbers and reducing the number of iterations needed to reach global ***,the LJP approach in this work uses improved semantic similarity and entropy features to train a hybrid classifier combining Bi-GRU and Deep *** output scores are fused using improved score level fusion to boost prediction *** proposed COLPO method experiments with real-time Madras High Court criminal cases(Dataset 1)and the Supreme Court of India database(Dataset 2),and its performance is compared with nature-inspired algorithms such as Sparrow Search Algorithm(SSA),COOT,Spider Monkey Optimization(SMO),Pelican Optimization Algorithm(POA),as well as baseline classifier models and transformer neural *** results show that the proposed hybrid classifier with COLPO outperforms other cutting-edge LJP algorithms achieving 93.4%and 94.24%accuracy,respectively.

关键词： Bi-GRU deep maxout semantic similarity legal judgment prediction opposition based learning pelican optimization

来源：评论

学校读者我要写书评

暂无评论

Limb movement detection and analysis based on visual recognition of human posture

引用

Discover Artificial Intelligence 2025年第1期5卷 1-12页

作者： Xiao, Zhiguo Wang, Chunxiang Ding, Tianjiao Shen, Xiangfeng Li, Xinyuan Li, Dongni School of Computer Science & Technology Beijing Institute of Technology Beijing100811 China School of Computer Science Technology Changchun University Changchun130022 China

Current motion detection and evaluation technologies face challenges such as limited scalability, imprecise feedback, and lack of personalized guidance. To address these challenges, this research integrated efficient BlazePose technology with pioneering DW_KNN* algorithm, resulting in the remarkable accuracy of 98.2% in action recognition and showcasing outstanding scalability. Furthermore, the established ACLstm time series prediction model could comprehensively analyze historical sports data and associated factors of users. In Rehab dataset, MAE(Mean Absolute Error, MAE) loss was 1.383 for motion count and 0.508 for motion time. This innovative framework delivered precise feedback and tailored guidance for physical exercise and medical rehabilitation. © The Author(s) 2025.

关键词： Time series

来源：评论

学校读者我要写书评

暂无评论

openGauss:An Open-Source Database for the Era of Artificial Intelligence

引用

Journal of computer science & technology 2024年第5期39卷 1005-1006页

作者： Jian-Zhong Li Shenzhen Institute of Advanced Technology Shenzhen 518055China School of Computer Science and Technology Harbin Institute of TechnologyHarbin 150001China

Databases play a vital role in data management in many fields,such as finance,government,telecommunications,energy,electricity,transportation,*** the database management system has become a core foundational *** is an enterprise-grade open-source database,a product of deep integration of research and development from Huawei,Tsinghua University,and China Mobile in the past decade.

关键词： database Open finance

来源：评论

学校读者我要写书评

暂无评论

MH-Net: Multiheaded 3D Hand Pose Estimation Network With 3D Anchorsets and Improved Multiscale Vision Transformer

IEEE Transactions on Intelligent Vehicles

引用

IEEE Transactions on Intelligent Vehicles 2024年第10期9卷 1-12页

作者： Tewolde, Tekie Tsegay Manjotho, Ali Asghar Niu, Zhendong School of Computer Science and Technology Beijing Institute of Technology Beijing China

Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transformations, such as data alignments or generating multiple views, which are prone to data sensitivity, error propagation, and prohibitive computational requirements. We improve the estimation accuracy by mitigating the impact of self-occlusion and viewpoint variations from the network side and propose MH-Net, a novel multiheaded network for accurate 3D hand pose estimation from a depth image. MH-Net comprises three key components. First, a multiscale feature extraction backbone based on an improved multiscale vision transformer (MViTv2) is proposed to extract shift-invariant global features. Second, a 3D anchorset generator is proposed to generate three disjoint sets of 3D anchors that serve two purposes: formulating hand pose estimation as an anchor-to-joint offset estimation and defining three unique viewpoints from a single depth image. Third, three identical regression heads are proposed to regress 3D joint positions based on unique viewpoints defined by their respective anchorsets. Extensive ablation studies have been conducted to investigate the impact of anchorsets, regression heads, and feature extraction backbones. Experiments on three public datasets, ICVL, MSRA, and NYU, show significant improvements over the state-of-the-art. IEEE

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

COVID-19 emergency decision-making using q-rung linear diophantine fuzzy set,differential evolutionary and evidential reasoning techniques

引用

Applied Mathematics(A Journal of Chinese Universities) 2025年第1期40卷 182-206页

作者： G Punnam Chander Sujit Das Department of Computer science and Engineering National Institute of TechnologyWarangal 506004India

In this paper,a robust and consistent COVID-19 emergency decision-making approach is proposed based on q-rung linear diophantine fuzzy set(q-RLDFS),differential evolutionary(DE)optimization principles,and evidential reasoning(ER)*** proposed approach uses q-RLDFS in order to represent the evaluating values of the alternatives corresponding to the *** optimization is used to obtain the optimal weights of the attributes,and ER methodology is used to compute the aggregated q-rung linear diophantine fuzzy values(q-RLDFVs)of each *** the score values of alternatives are computed based on the aggregated *** alternative with the maximum score value is selected as a better *** applicability of the proposed approach has been illustrated in COVID-19 emergency decision-making system and sustainable energy planning ***,we have validated the proposed approach with a numerical ***,a comparative study is provided with the existing models,where the proposed approach is found to be robust to perform better and consistent in uncertain environments.

关键词： COVID-19 q-rung linear diophantine fuzzy set differential evolutionary evidential reasoning decision-making

来源：评论

学校读者我要写书评

暂无评论

Facial Expression Recognition with High Response-Based Local Directional Pattern (HR-LDP) Network

引用

computers, Materials & Continua 2024年第2期78卷 2067-2086页

作者： Sherly Alphonse Harshit Verma School of Computer Science and Engineering Vellore Institute of TechnologyChennaiIndia

Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Patterns(LDP),which has good characteristics for emotion detection has yielded encouraging *** innova-tive end-to-end learnable High Response-based Local Directional Pattern(HR-LDP)network for facial emotion recognition is implemented by employing fixed convolutional filters in the proposed *** combining learnable convolutional layers with fixed-parameter HR-LDP layers made up of eight Kirsch filters and derivable simulated gate functions,this network considerably minimizes the number of network *** cost of the parameters in our fully linked layers is up to 64 times lesser than those in currently used deep learning-based detection *** seven well-known databases,including JAFFE,CK+,MMI,SFEW,OULU-CASIA and MUG,the recognition rates for seven-class facial expression recognition are 99.36%,99.2%,97.8%,60.4%,91.1%and 90.1%,*** results demonstrate the advantage of the proposed work over cutting-edge techniques.

关键词： Emotion classification CNN network HR-LDP

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：