检索结果-内蒙古大学图书馆

Robust video question answering via contrastive cross-modality representation learning

science China(Information sciences) 2024年第10期67卷 211-226页

作者： Xun YANG Jianming ZENG Dan GUO Shanshan WANG Jianfeng DONG Meng WANG School of Information Science and Technology University of Science and Technology of China Institute of Artificial Intelligence Hefei Comprehensive National Science Center School of Computer Science and Information Engineering Hefei University of Technology Institutes of Physical Science and Information Technology Anhui University School of Computer Science and Technology Zhejiang Gongshang University

Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.

关键词： video question answering cross-modality fusion contrastive learning cross-media reasoning

来源：评论

学校读者我要写书评

暂无评论

Central Attention Mechanism for Convolutional Neural Networks

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第10期51卷 1642-1648页

作者： Geng, Y.X. Wang, L. Wang, Z.Y. Wang, Y.G. School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China Automation Design Institute Metallurgical Engineering Technology Co. Ltd. Dalian116000 China

Model performance has been significantly enhanced by channel attention. The average pooling procedure creates skewness, lowering the performance of the network architecture. In the channel attention approach, average pooling is used to collect feature information to provide representative values. By leveraging the central limit theorem, we hypothesize that the strip-shaped average pooling operation will generate a one-dimensional tensor by considering the spatial position information of the feature map. The resulting tensor, obtained through average pooling, serves as the representative value for the features, mitigating skewness during the process. By incorporating the concept of the central limit theorem into the channel attention operation process, this study introduces a novel attention mechanism known as the"Central Attention Mechanism (CAM)." Instead of directly using average pooling to generate channel representative values, the central attention approach employs star-stripe average pooling to normalize multiple feature representative values into a single representative value. In this way, strip-shaped average pooling can be utilized to collect data and generate a one-dimensional tensor, while star-stripe average pooling can provide feature representative values based on different spatial directions. To generate channel attention for the complementary input features, the activation of the feature representation value is performed for each channel. Our attention approach is flexible and can be seamlessly incorporated into various traditional network structures. Through rigorous testing, we demonstrate the effectiveness of our attention strategy, which can be applied to a wide range of computer vision applications and outperforms previous attention techniques. © (2024), (International Association of Engineers). All rights reserved.

关键词： Tensors

来源：评论

学校读者我要写书评

暂无评论

Research on PM_(2.5) Concentration Prediction Algorithm Based on Temporal and Spatial Features

引用

computers, Materials & Continua 2023年第6期75卷 5555-5571页

作者： Song Yu Chen Wang School of Computer Science and Engineering Central South UniversityChangsha410000China

PM2.5 has a non-negligible impact on visibility and air quality as an important component of haze and can affect cloud formation and rainfall and thus change the climate,and it is an evaluation indicator of air pollution *** PM2.5 concentration prediction based on relevant historical data mining can effectively improve air pollution forecasting ability and guide air pollution prevention and *** past methods neglected the impact caused by PM2.5 flow between cities when analyzing the impact of inter-city PM2.5 concentrations,making it difficult to further improve the prediction ***,factors including geographical information such as altitude and distance and meteorological information such as wind speed and wind direction affect the flow of PM2.5 between cities,leading to the change of PM2.5 concentration in *** a PM2.5 directed flow graph is constructed in this *** and meteorological data is introduced into the graph structure to simulate the spatial PM2.5 flow transmission relationship between *** introduction of meteorological factors like wind direction depicts the unequal flow relationship of PM2.5 between *** on this,a PM2.5 concentration prediction method integrating spatial-temporal factors is proposed in this paper.A spatial feature extraction method based on weight aggregation graph attention network(WGAT)is proposed to extract the spatial correlation features of PM2.5 in the flow graph,and a multi-step PM2.5 prediction method based on attention gate control loop unit(AGRU)is *** PM2.5 concentration prediction model WGAT-AGRU with fused spatiotemporal features is constructed by combining the two methods to achieve multi-step PM2.5 concentration ***,accuracy and validity experiments are conducted on the KnowAir dataset,and the results show that the WGAT-AGRU model proposed in the paper has good performance in terms of prediction accuracy and validates the effectiveness

关键词： Spatiotemporal fusion PM2.5 concentration prediction graph neural network recurrent neural network attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Multi-Objective Offloading Optimization in MEC and Vehicular-Fog Systems: A Distributed-TD3 Approach

引用

IEEE Transactions on Intelligent Transportation Systems 2024年第11期25卷 16897-16909页

作者： Wakgra, Frezer Guteta Kar, Binayak Tadele, Seifu Birhanu Shen, Shan-Hsiang Khan, Asif Uddin The Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei106 Taiwan The School of Computer Engineering KIIT Deemed to be University Odisha Bhubaneswar751024 India

The emergence of 5G networks has enabled the deployment of a two-tier edge and vehicular-fog network. It comprises Multi-access Edge Computing (MEC) and Vehicular-Fogs (VFs), strategically positioned closer to Internet of Things (IoT) devices, reducing propagation latency compared to cloud-based solutions and ensuring satisfactory quality of service (QoS). However, during high-traffic events like concerts or athletic contests, MEC sites may face congestion and become overloaded. Utilizing offloading techniques, we can transfer computationally intensive tasks from resource-constrained devices to those with sufficient capacity, for accelerating tasks and extending device battery life. In this research, we consider offloading within a two-tier MEC and VF architecture, involving offloading from MEC to MEC and from MEC to VF. The primary objective is to minimize the average system cost, considering both latency and energy consumption. To achieve this goal, we formulate a multi-objective optimization problem aimed at minimizing latency and energy while considering given resource constraints. To facilitate decision-making for nearly optimal computational offloading, we design an equivalent reinforcement learning environment that accurately represents the network architecture and the formulated problem. To accomplish this, we propose a Distributed-TD3 (DTD3) approach, which builds on the TD3 algorithm. Extensive simulations, demonstrate that our strategy achieves faster convergence and higher efficiency compared to other benchmark solutions. © 2024 IEEE.

关键词： Quality of service

来源：评论

学校读者我要写书评

暂无评论

A significant wave height prediction method with ocean characteristics fusion and spatiotemporal dynamic graph modeling

引用

Acta Oceanologica Sinica 2024年第12期43卷 13-33页

作者： Xiao Yin Taoxing Wu Jie Yu Xiaoyu He Lingyu Xu Department of Computer Engineering and Science Shanghai UniversityShanghai 200444China School of Computer Science and Technology Zhejiang Sci-Tech UniversityHangzhou 310018China

Accurate significant wave height(SWH)prediction is essential for the development and utilization of wave *** learning methods such as recurrent and convolutional neural networks have achieved good results in SWH ***,these methods do not adapt well to dynamic seasonal variations in wave *** this study,we propose a novel method—the spatiotemporal dynamic graph(STDG)neural *** method predicts the SWH of multiple nodes based on dynamic graph modeling and multi-characteristic ***,considering the dynamic seasonal variations in the wave direction over time,the network models wave dynamic spatial dependencies from long-and short-term pattern ***,to correlate multiple characteristics with SWH,the network introduces a cross-characteristic transformer to effectively fuse multiple ***,we conducted experiments on two datasets from the South China Sea and East China Sea to validate the proposed method and compared it with five prediction methods in the three *** experimental results show that the proposed method achieves the best performance at all predictive scales and has greater advantages for extreme value ***,an analysis of the dynamic graph shows that the proposed method captures the seasonal variation mechanism of the waves.

关键词： significant wave height forecasting dynamic seasonal variation dynamic graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Attenuate Class Imbalance Problem for Pneumonia Diagnosis Using Ensemble Parallel Stacked Pre-Trained Models

引用

computers, Materials & Continua 2023年第4期75卷 891-909页

作者： Aswathy Ravikumar Harini Sriraman School of Computer Science and Engineering Vellore Institute of TechnologyChennai600127India

Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this ***-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.

关键词： Pneumonia prediction distributed deep learning data parallel model ensemble deep learning class imbalance skewed data

来源：评论

学校读者我要写书评

暂无评论

Class-conditional domain adaptation for semantic segmentation

引用

Computational Visual Media 2024年第5期10卷 1013-1030页

作者： Yue Wang Yuke Li James H.Elder Runmin Wu Huchuan Lu School of Information and Communication Engineering Dalian University of TechnologyDalian 116024China School of Computer Science Wuhan UniversityWuhan 430072China Department of Electrical Engineering and Computer Science York UniversityToronto M3J 1P3Canada Department of Computer Science the University of Hong KongHong Kong 999077China

Semantic segmentation is an important sub-task for many ***,pixel-level ground-truth labeling is costly,and there is a tendency to overfit to training data,thereby limiting the generalization *** domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain(including less expensive synthetic domain)to be adapted to a novel target *** conventional approach involves automatic extraction and alignment of the representations of source and target domains *** limitation of this approach is that it tends to neglect the differences between classes:representations of certain classes can be more easily extracted and aligned between the source and target domains than others,limiting the adaptation over all ***,we address:this problem by introducing a Class-Conditional Domain Adaptation(CCDA)*** incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and ***,they measure the segmentation,shift the domain in a classconditional manner,and equalize the loss over *** results demonstrate that the performance of our CCDA method matches,and in some cases,surpasses that of state-of-the-art methods.

关键词： domain adaptation generative adversarial networks semantic segmentation cityscapes

来源：评论

学校读者我要写书评

暂无评论

Improved Set Algebra-Based Heuristic Technique for Training Multiplicative Functional Link Artificial Neural Networks for Financial Time Series Forecasting

引用

SN computer science 2024年第5期5卷 567页

作者： Behera, Sudersan Kumar, AVS Pavan Nayak, Sarat Chandra Department of Computer Science and Engineering School of Engineering & amp Technology GIET University Gunupur India Department of Computer Science and Engineering School of Technology GITAM University Hyderabad India

The current study is defined by two main aims. An effective strategy for improving local search is to combine the Set Algebra-Based Heuristic Algorithm (SAHA) algorithm with the Nelder-Mead simplex method. The approach outlined above, referred to as the Improved SAHA (ISAHA), and has the ability to produce superior outcomes. The multiplicative functional link artificial neural network (MFLANN) is an improved version of the functional link artificial neural network that promotes exploration by replacing the summing unit in the output layer with a multiplication unit. Moreover, the combination of ISAHA and MFLANN results in the development of ISAHA-MFLANN, a sophisticated hybrid forecasting model. The main assessment of the hybrid model rests on its ability to forecast complex and dynamic financial time series. It's possible to get around the problems that come with traditional learning-based MFLANN techniques by using MFLANN's advanced approximation features and ISAHA's resilient global search features together. Experimental verification using two stock market datasets and three currency exchange rates demonstrates the validity of the idea. The results show that the suggested improved SAHA hybrid learning works well at improving six standard benchmark functions. The ISAHA-MFLANN model is also statistically significant at accurately capturing the volatility that is inherent in the financial time series. In addition, it surpasses other models such as SAHA-MFLANN, Monarch Butterfly Optimization-MFLANN, Particle Swarm Optimization-MFLANN, Genetic Algorithm-MFLANN, Gradient Descent-MFLANN, Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Auto Regressive Integrated Moving Average (ARIMA). © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.

关键词： Exchange rate forecasting Improved Set Algebra-Based Heuristic Algorithm Multiplicative functional link artificial neural network Stock market forecasting

来源：评论

学校读者我要写书评

暂无评论

Machine learning algorithms for the diagnosis of Alzheimer and Parkinson disease

引用

Journal of Medical engineering and Technology 2023年第1期47卷 35-43页

作者： Nancy Noella, R.S. Priyadarshini, J. School of Computer Science and Engineering VIT University Chennai India

Dementia is a general term used to indicate any disorder related to human memory. The various memory-related problems severely affect the human brain and so the individual feels difficulty in doing their normal physical as well as mental activities. There are different types of dementia that exist, but the commonly seen and fatal types of dementia are Alzheimer’s disease (AD) and Parkinson’s disease (PD). In this paper different efficient Machine Learning Techniques are selected analysed their behaviours in the diagnosis of AD and PD using Positron Emission Tomography (PET). The PET image dataset used in this work consists of 1050 images with AD, PD and Healthy Brain images. The total number of images is split into two different categories in the ratio of 7:3 for training and testing respectively. The different machine learning classifiers used are Bagged Ensemble, ID3, Naive Bayes and Multiclass Support Vector Machine. The classification of the AD and PD with the reference of a healthy brain is done by comparing the input image with the trained samples in the PET image database. In the comparison of trained samples with the input image for the PET images, the bagged ensemble learning classifier worked better than the other classification algorithms and yielded an accuracy of 90.3%. © 2022 Informa UK Limited, trading as Taylor & Francis Group.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Automatic lung cancer detection using hybrid particle snake swarm optimization with optimized mask RCNN

引用

Multimedia Tools and Applications 2024年第31期83卷 76807-76831页

作者： Sudha, R. Maheswari, K. M. Uma Department of Computer Science and Engineering School of computing College of engineering and technology SRM Institute of Science and Technology Kattankulathur India Department of Computing Technologies School of Computing College of engineering and technology SRM Institute of Science and Technology Kattankulathur India

As a result of its aggressive nature and late identification at advanced stages, lung cancer is one of the leading causes of cancer-related deaths. Lung cancer early diagnosis is a serious and difficult challenge that is crucial to a person's survival. The first diagnosis of the malignant nodules is typically made using chest radiography (X-rays) and computed tomography (CT) scans;however, the potential presence of benign nodules results in incorrect conclusions. The early phases of both benign and malignant nodules exhibit striking similarities. In this paper, a novel deep learning-based model is proposed for the precise diagnosis of malignant nodules. The proposed approach consists of two stages namely, pre-processing and lung nodule detection. Initially, the Lung CT scan images are collected from the dataset. Then, to remove the noise present in the input image, we apply an adaptive median filter. Then, to enhance the image, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied. After pre-processing, the image is given to the optimized mask RCNN classifier to detect the malignant and benign nodules. To enhance the performance of the Mask RCNN classifier, the hyper-parameters are optimally selected using hybrid particle snake swarm optimization (PS2OA). The proposed PS2OA is a hybridization of particle swarm optimization (PSO) and snake swarm optimization (SSO). The performance of the proposed approach is analyzed based on different metrics and effectiveness compared with state-of-the-art works. The proposed approach attained the maximum accuracy of 97.67%. This work aimed at assisting radiologists to detect and diagnose small-size pulmonary nodules more accurately. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Diagnosis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：