检索结果-内蒙古大学图书馆

Multi-lesion Segmentation of Fundus Images using Improved UNet++

学校读者我要写书评

暂无评论

IAENG International Journal of computer science 2024年第10期51卷 1587-1595页

作者： Jiang, Haoyan Zhao, Ji School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

Diabetes retinopathy (DR) is one of the complications of diabetes. Early diagnosis of retinopathy is helpful to avoid vision loss or blindness. The difficulty of this task lies in the significant differences in the size and shape of lesions between different DR samples, with a higher proportion of small lesions. We propose a new multi-disease segmentation method based on UNet++ to improve the segmentation accuracy of DR lesions. We chose Resnet50 as the backbone network and introduced a new hybrid residual module to replace the original residual module. At the same time, to compensate for the loss of information in DR small lesions during the feature extraction process, we introduce the Across Feature Map Attention (AFMA) is an auxiliary branch that enhances the segmentation accuracy of small-scale lesions. Finally, in response to the difficulty in extracting DR lesions in shallow models, the model abandoned the deep supervision structure in UNet++. In addition, we use a weighted mixed loss function to train the model. We conducted experiments on IDRID and DDR public datasets, simultaneously segmenting four typical DR lesions. The results on intersection over union (IOU) and dice similarity coefficient (Dice) showed that our method achieved competitive performance compared to other research methods. © (2024), (International Association of Engineers). All rights reserved.

关键词： Semantic Segmentation

PCB Surface Defect Detection based on YOLOv8n

学校读者我要写书评

暂无评论

IAENG International Journal of computer science 2024年第12期51卷 2017-2025页

作者： You, Rui Wang, Zhifeng School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China

In the electronic manufacturing industry, accurate detection of PCB defects is crucial as it directly impacts product quality and reliability. The primary challenges in PCB defect detection include missed detections and false alarms, particularly concerning micro-defects. This study proposes an enhanced PCB defect detection algorithm based on YOLOv8, which incorporates the Global Attention Module (GAM), Partial Convolution Layer (PConv3), and a multi-head detection strategy. The GAM improves the model’s sensitivity to micro-defects by capturing and weighting global context information through spatial and channel attention mechanisms applied to the input feature map. The PConv3 optimizes feature extraction, minimizing false alarms due to information loss. The multi-head detection strategy identifies defects at varying scales, preserving detailed information and enhancing the detection of small-sized defects. Experimental results demonstrate that the improved algorithm achieves a 3.6% increase in average precision while meeting real-time detection requirements. © (2024), (International Association of Engineers). All rights reserved.

关键词： Deep Learning Global Attention Module Multi-head Detection PCB Defect Detection PConv3 YOLOv8

Face Detection Based on Improved Multi-task Cascaded Convolutional Neural Networks

学校读者我要写书评

暂无评论

IAENG International Journal of computer science 2024年第2期51卷 67-74页

作者： Jia, Siyu Tian, Ying College of Computer and Software Engineering University of Science and Technology Liaoning Liaoning 114051 China School of Computer and Software Engineering University of Science and Technology Liaoning Liaoning 114051 China

With the development of deep learning and computer vision, face detection has achieved rapid progress owing. Face detection has several application domains, including identity authentication, security protection, media, and entertainment. Although multi-task cascaded convolutional neural networks (MTCNN) have high accuracy and robustness, the model has the disadvantages of large parameters and computational overhead in the real scene due to the complexity of the real scene and the constraints of hardware facilities. Therefore, the development of an improved network model is crucial. This paper improves the MTCNN model by reducing the number of parameters and the computational overhead and using better model parameters to locate the key points of the face. This model improves the accuracy and robustness of the face age estimation. The WiderFace and CelebA datasets are used for training. The final face detection accuracy reaches 98.7% while simultaneously reducing the number of model parameters to 70% under the same conditions. This model meets the application needs of modern society for face detection and demonstrates the efficiency and accuracy of the improved network model. © (2024), (International Association of Engineers). All Rights Reserved.

关键词： Face recognition

Central Attention Mechanism for Convolutional Neural Networks

学校读者我要写书评

暂无评论

IAENG International Journal of computer science 2024年第10期51卷 1642-1648页

作者： Geng, Y.X. Wang, L. Wang, Z.Y. Wang, Y.G. School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China School of Computer Science and Software Engineering University of Science and Technology Liaoning Anshan114051 China Automation Design Institute Metallurgical Engineering Technology Co. Ltd. Dalian116000 China

Model performance has been significantly enhanced by channel attention. The average pooling procedure creates skewness, lowering the performance of the network architecture. In the channel attention approach, average pooling is used to collect feature information to provide representative values. By leveraging the central limit theorem, we hypothesize that the strip-shaped average pooling operation will generate a one-dimensional tensor by considering the spatial position information of the feature map. The resulting tensor, obtained through average pooling, serves as the representative value for the features, mitigating skewness during the process. By incorporating the concept of the central limit theorem into the channel attention operation process, this study introduces a novel attention mechanism known as the"Central Attention Mechanism (CAM)." Instead of directly using average pooling to generate channel representative values, the central attention approach employs star-stripe average pooling to normalize multiple feature representative values into a single representative value. In this way, strip-shaped average pooling can be utilized to collect data and generate a one-dimensional tensor, while star-stripe average pooling can provide feature representative values based on different spatial directions. To generate channel attention for the complementary input features, the activation of the feature representation value is performed for each channel. Our attention approach is flexible and can be seamlessly incorporated into various traditional network structures. Through rigorous testing, we demonstrate the effectiveness of our attention strategy, which can be applied to a wide range of computer vision applications and outperforms previous attention techniques. © (2024), (International Association of Engineers). All rights reserved.

关键词： Tensors

Image Guidance Encoder-Decoder Model in Image Captioning and Its Application

学校读者我要写书评

暂无评论

IAENG International Journal of computer science 2024年第9期51卷 1385-1392页

作者： Yang, Zhen Zhou, Ziwei Wang, Chaoyang Xu, Liang School of Applied Technology University of Science and Technology Liaoning Anshan China School of Computer and Software Engineering University of Science and Technology Liaoning Anshan China School of Computer and Software Engineering University of Science and Technology Liaoning Anshan China

This paper introduces a new network model - the Image Guidance Encoder-Decoder Model (IG-ED), designed to enhance the efficiency of image captioning and improve predictive accuracy. IG-ED, a fusion of the convolutional network VGGNet-16 and the long short-term memory network (LSTM), is designed based on the encoder-decoder structure. The image captioning performance sees significant enhancements when leveraging the IG-ED network model. The network training process unfolds in a series of steps. Initially, the input image undergoes convolution via the VGGNet-16 network, producing a 512-dimensional vector. Concurrently, each word in the image's caption is encoded to generate a corresponding 512-dimensional vector consistent with the image feature dimension. These two vectors form the input for the decoding process. Subsequently, the vectors are fed into the redesigned fusion LSTM (F-LSTM) network at different time steps to gradually train the parameters of the IG-ED framework. The training process is completed by utilizing a loss function for determining convergence. Evaluation of the IG-ED model's performance is conducted using CIDEr and seven other evaluation metrics on the MSCOCO 2014 dataset. The results exhibit substantial improvements over the "Adaptive Attention Mode" network and "Neural Talk" network. Additionally, the parameter count of the IG-ED architecture is significantly reduced compared to the "Adaptive Attention Mode" network, leading to decreased computational resource requirements and enabling edge computing on the neural network. © (2024), (International Association of Engineers). All Rights Reserved.

关键词： Long short-term memory

A Generative Model-Based Network Framework for Ecological Data Reconstruction

学校读者我要写书评

暂无评论

computers, Materials & Continua 2025年第1期82卷 929-948页

作者： Shuqiao Liu Zhao Zhang Hongyan Zhou Xuebo Chen School of Electronic and Information Engineering University of Science and Technology LiaoningAnshan114051China School of Computer Science and Software Engineering University of Science and Technology LiaoningAnshan114051China

This study examines the effectiveness of artificial intelligence techniques in generating high-quality environmental data for species introductory site selection *** Strengths,Weaknesses,Opportunities,Threats(SWOT)analysis data with Variation Autoencoder(VAE)and Generative AdversarialNetwork(GAN)the network framework model(SAE-GAN),is proposed for environmental data *** model combines two popular generative models,GAN and VAE,to generate features conditional on categorical data embedding after SWOT *** model is capable of generating features that resemble real feature distributions and adding sample factors to more accurately track individual sample *** data is used to retain more semantic information to generate *** model was applied to species in Southern California,USA,citing SWOT analysis data to train the *** show that the model is capable of integrating data from more comprehensive analyses than traditional methods and generating high-quality reconstructed data from them,effectively solving the problem of insufficient data collection in development *** model is further validated by the Technique for Order Preference by Similarity to an Ideal Solution(TOPSIS)classification assessment commonly used in the environmental data *** study provides a reliable and rich source of training data for species introduction site selection systems and makes a significant contribution to ecological and sustainable development.

关键词： Convolutional Neural Network(CNN) VAE GAN TOPSIS data reconstruction

OCRBench: on the hidden mystery of OCR in large multimodal models

学校读者我要写书评

暂无评论

science China(Information sciences) 2024年第12期67卷 23-35页

作者： Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI School of Artificial Intelligence and Automation Huazhong University of Science and Technology School of Electronic and Information Engineering South China University of Technology Microsoft Research School of Computer & Communication Engineering University of Science and Technology Beijing Institute of Automation Chinese Academy of Sciences School of Software Engineering Huazhong University of Science and Technology

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.

关键词： large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

An Unbiased Fuzzy Weighted Relative Error Support Vector Machine for Reverse Prediction of Concrete Components

学校读者我要写书评

暂无评论

IEEE Transactions on Artificial Intelligence

IEEE Transactions on Artificial Intelligence 2024年第9期5卷 4574-4584页

作者： Fan, Zongwen Gou, Jin Weng, Shaoyuan College of Computer Science and Technology Huaqiao University Xiamen361021 China School of Software Engineering Xiamen Institute of Software Technology Xiamen361024 China

Concrete is a vital component in modern construction, prized for its strength, durability, and versatility. Accurately determining the quantities of concrete components is crucial in civil engineering applications to optimize resources (e.g., manpower and financial resources). In this article, we propose an unbiased fuzzy-weighted relative error support vector machine (UFW-RE-SVM) for reverse prediction of concrete components. First, we add an unbiased term to the target function of UFW-RE-SVM for obtaining an unbiased model. Second, we design a fuzzy-weighted operation to indicate sample importance by incorporating the fuzzy membership values into the UFW-RE-SVM. The nth root operation is introduced to address the exponential explosion issue in the fuzzy-weighted operation. Finally, considering the UFW-RE-SVM is sensitive to its hyperparameters for multioutput prediction, the whale optimization algorithm (WOA) is utilized for hyperparameter optimization for its effectiveness in optimization tasks. We design the fitness function based on the results from multiple components to balance the performance of multioutput predictions. Experimental results show that the performance of our proposed model outperforms existing works in predicting concrete components in terms of mean absolute relative error, standard deviation, and root mean square error. Further, the statistical test shows the WOA and two other metaheuristics can significantly improve the prediction performance. This indicates that the unbiased term, fuzzy-weighted operation, and WOA are effective for improving the proposed model for reverse prediction concrete components. With these promising results, the proposed model could provide decision-makers with a valuable tool for determining concrete component quantities based on desired concrete qualities. © 2020 IEEE.

关键词： Forecasting

On learning the right attention point for feature enhancement

学校读者我要写书评

暂无评论

science China(Information sciences) 2023年第1期66卷 131-143页

作者： Liqiang LIN Pengdi HUANG Chi-Wing FU Kai XU Hao ZHANG Hui HUANG College of Computer Science and Software Engineering Shenzhen University Department of Computer Science and Engineering The Chinese University of Hong Kong School of Computer Science National University of Defense Technology School of Computing Science Simon Fraser University

We present a novel attention-based mechanism to learn enhanced point features for point cloud processing tasks, e.g., classification and segmentation. Unlike prior studies, which were trained to optimize the weights of a pre-selected set of attention points, our approach learns to locate the best attention points to maximize the performance of a specific task, e.g., point cloud classification. Importantly, we advocate the use of single attention point to facilitate semantic understanding in point feature learning. Specifically,we formulate a new and simple convolution, which combines convolutional features from an input point and its corresponding learned attention point(LAP). Our attention mechanism can be easily incorporated into state-of-the-art point cloud classification and segmentation networks. Extensive experiments on common benchmarks, such as Model Net40, Shape Net Part, and S3DIS, all demonstrate that our LAP-enabled networks consistently outperform the respective original networks, as well as other competitive alternatives, which employ multiple attention points, either pre-selected or learned under our LAP framework.

关键词： point convolution feature enhancement attention point deep neural network