检索结果-内蒙古大学图书馆

A comparative study of LSTM-ED architectures in forecasting day-ahead solar photovoltaic energy using Weather Data

COMPUTING 2024年第5期106卷 1611-1632页

作者： Ekinci, Ekin Sakarya Univ Appl Sci Fac Technol Comp Engn Dept Sakarya Turkiye

Solar photovoltaic (PV) energy, with its clean, local, and renewable features, is an effective complement to traditional energy sources today. However, the photovoltaic power system is highly weather-dependent and therefore has unstable and intermittent characteristics. Despite the negative impact of these features on solar sources, the increase in worldwide installed PV capacity has made solar energy prediction an important research topic. This study compares three encoder-decoder (ED) networks for day-ahead solar PV energy prediction: Long Short-Term Memory ED (LSTM-ED), Convolutional LSTM ED (Conv-LSTM-ED), and Convolutional Neural Network and LSTM ED (CNN-LSTM-ED). The models are tested using 1741-day-long datasets from 26 PV panels in Istanbul, Turkey, considering both power and energy output of the panels and meteorological features. The results show that the Conv-LSTM-ED with 50 iterations is the most successful model, achieving an average prediction score of up to 0.88 over R-square (R2). Evaluation of the iteration counts' effect reveals that the Conv-LSTM-ED with 50 iterations also yields the lowest Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) values, confirming its success. In addition, the fitness and effectiveness of the models are evaluated, with the Conv-LSTM-ED achieving the lowest Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for each iteration. The findings of this work can help researchers build the best data-driven methods for forecasting PV solar energy based on PV features and meteorological features.

关键词： Solar photovoltaic energy Foecasting Time-series encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

Enhancing Image Understanding: A Comparative Analysis of encoder architectures for Image Captioning 15

Enhancing Image Understanding: A Comparative Analysis of Enc...

引用

15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024

作者： Chakraborty, Debashree Singh, Aditi Baliarsingh, Santos Kumar Kiit Deemed to be University School of Computer Engineering Bhubaneswar751024 India

ISBN: (纸本)9798350370249

Image caption generation is a critical research area that combines computer vision and natural language processing, with wide-ranging implications such as assisting visually impaired individuals, improving autonomous vehicle capabilities, and refining image search algorithms. The main goal of image captioning is to identify objects in an image and explain the relationships between them using text. In the context of image captioning, this research compares three popular encoding architectures: ResNet50, VGG16 and InceptionV3. To analyse the input image in order to obtain significant features that are then converted for vector representation, an encoder component of the architecture is used. These encoded features are used by the decoder module to create a coherent text description using the Long Short Term Memory LSTM model. By conducting thorough experiments and evaluations, this study aims to uncover the performance differences among the mentioned encoder architectures and determine the best model for image captioning tasks. © 2024 IEEE.

关键词： encoder-decoder architecture Image captioning InceptionV3 LSTM Comparative analysis ResNet50 VGG16

来源：评论

学校读者我要写书评

暂无评论

ACEANet: Ambiguous Context Enhanced Attention Network for skin lesion segmentation

引用

INTELLIGENT DATA ANALYSIS 2024年第3期28卷 791-805页

作者： Jiang, Yun Qiao, Hao Northwest Normal Univ Dept Comp Sci & Engn Lanzhou Gansu Peoples R China

Skin lesion segmentation from dermatoscopic images is essential for the diagnosis of skin cancer. However, it is still a challenging task due to the ambiguity of the skin lesions, the irregular shape of the lesions and the presence of various interfering factors. In this paper, we propose a novel Ambiguous Context Enhanced Attention Network (ACEANet) based on the classical encoder-decoder architecture, which is able to accurately and reliably segment a variety of lesions with efficiency. Specifically, a novel Ambiguous Context Enhanced Attention module is embedded in the skip connection to augment the ambiguous boundary information. A Dilated Gated Fusion block is employed in the end of the encoding phase, which effectively reduces the loss of spatial location information due to continuous downsampling. In addition, we propose a novel Cascading Global Context Attention to fuse feature information generated by the encoder with features generated by the decoder of the corresponding layer. In order to verify the effectiveness and advantages of the proposed network, we have performed comparative experiments on ISIC2018 dataset and PH2 dataset. Experiments results demonstrate that the proposed model has superior segmentation performance for skin lesions.

关键词： Skin lesion segmentation medical image processing feature extraction encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

Semantic segmentation of satellite images for crop type identification in smallholder farms

引用

JOURNAL OF SUPERCOMPUTING 2024年第2期80卷 1367-1395页

作者： Buttar, Preetpal Kaur Sachan, Manoj Kumar St Longowal Inst Engn & Technol Dept Comp Sci & Engn Longowal 148106 Punjab India

Accurate and reliable crop type identification from satellite images provides a foundation for crop yield predictions which paves the way to help ensure food security. Most of the work done in the field of crop type mapping using remote sensing is restricted to the developed countries having large field parcels, while a little effort has been directed towards doing so for developing countries, where this task becomes more challenging due to the small size of field parcels, irregular shapes of the fields, and an acute shortage of labelled datasets for training supervised machine learning models. In this research, we try to fill this gap in the literature by exploring the feasibility of performing the semantic segmentation of agricultural fields from satellite images by proposing an encoder-decoder-based semantic segmentation architecture, CropNet, with a ResNet network as the encoder backbone and the use of attention modules in the decoder to allow the model to focus on more important portions of the feature maps and the feature fusion to concatenate the feature maps from all the decoder nodes getting a more precise prediction by bringing the spatial location information from the previous layers. The architecture outperformed the state of the art by 0.51% and 1.3%, on overall accuracy and macro-F1 score, respectively, after being trained on the "2019 Zindi's Farm Pin Crop Detection" dataset of Sentinel-2 images. The model achieved a field-wise overall classification accuracy of 78.06% and macro-F1 score of 67.3% and a pixel-wise segmentation mean Intersection over Union (mIoU) of 62.22% which is an improvement of 2.56% over the state-of-the-art methods, thereby demonstrating that our model is computationally efficient for the job of semantic segmentation of crop types from the satellite images in the difficult scenario of smallholder farms.

关键词： Semantic segmentation Remote sensing encoder-decoder architecture Sentinel-2 Crop type mapping Smallholder farms

来源：评论

学校读者我要写书评

暂无评论

Video Salient Object Detection Via Multi-level Spatiotemporal Bidirectional Network Using Multi-scale Transfer Learning

引用

IETE JOURNAL OF RESEARCH 2024年第11期70卷 8077-8088页

作者： Sharma, Gaurav Singh, Maheep Berwal, Krishan NIT Uttarakhand Srinagar Garhwal Dept Comp Sci Engn Srinagar 246174 Uttarakhand India Mil Coll Telecommun Engn Mhow Madhya Pradesh India

Video saliency prediction aims to resemble human visual attention by identifying the most relevant and significant elements in a video frame or sequence. This task becomes notably intricate in scenarios characterized by dynamic elements such as rapid motion, occlusions, blur, background variations, and nonrigid deformations. Therefore, the inherent complexity of human visual attention behavior during dynamic scenes necessitates the assessment of both temporal and spatial data. Existing video saliency frameworks often falter under such conditions, and relying solely on image saliency models neglects crucial temporal information in videos. This study presents a new Video Salient Object Detection via Multi-level Spatiotemporal Bidirectional Network using Multi-scale Transfer Learning (MSB-Net) to address the problem of identifying significant objects in videos. The proposed MSB-Net achieves notable results for a given sequence of frames by employing multi-scale transfer learning with an encoder and decoder approach to acquire knowledge and saliency map attributes spatially and temporally. The proposed MSB-Net model has bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) components. The VGG16 (Video Geometry Group) and VGG19 architectures extract multi-scale features from the input video frames. Evaluation of diverse datasets, namely DAVIS-T, SegTrack-V2, ViSal, VOS-T, and DAVSOD-T, demonstrates the model's effectiveness, outperforming other competitive models based on parameters such as MAE, F-measure, and S-measure.

关键词： encoder-decoder architecture Saliency Spatiotemporal Video

来源：评论

学校读者我要写书评

暂无评论

An Improved VGG-19 Network Induced Enhanced Feature Pooling for Precise Moving Object Detection in Complex Video Scenes

引用

IEEE ACCESS 2024年 12卷 45847-45864页

作者： Sahoo, Prabodh Kumar Panda, Manoj Kumar Panigrahi, Upasana Panda, Ganapati Jain, Prince Islam, Md. Shabiul Islam, Mohammad Tariqul Parul Univ Parul Inst Technol Dept Mechatron Vadodara 391760 Gujarat India GIET Univ Dept Elect & Commun Engn Rayagada 765022 Odisha India CV Raman Global Univ Dept Elect & Commun Bhubaneswar 752054 Odisha India Multimedia Univ Fac Engn FOE Cyberjaya 63100 Selangor Malaysia UKM Fac Engn & Built Environm Dept Elect Elect & Syst Engn Bangi 43600 Selangor Malaysia

Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect local changes, and the system could be utilized to face many real-life challenges. Most of the existing methods have addressed the problems of moderate and fast-moving object detection. However, very few literature have addressed the issues of slow moving object detection and these methods need further improvement to enhance the efficacy of detection. Hence, within this article, our significant endeavor involved identifying moving objects in challenging videos through an encoder-decoder architectural design, incorporating an enhanced VGG-19 model alongside a feature pooling framework. The proposed algorithm has various folds of novelties: a pre-trained VGG-19 architecture is modified and is used as an encoder with a transfer learning mechanism. The proposed model learns the weights of the improved VGG-19 model by a transfer-learning mechanism which enhances the model's efficacy. The proposed encoder is designed using a smaller number of layers to extract crucial fine and coarse scale features necessary for detecting the moving objects. The feature pooling framework (FPF) employed is a hybridization of a max-pooling layer, a convolutional layer, and multiple convolutional layers with distinct sampling rates to retain the multi-scale and multi-dimensional features at different scales. The decoder network consists of stacked convolution layers projecting from feature to image space effectively. The developed technique's efficacy is demonstrated against thirty-six state-of-the-art (SOTA) methods. The outcomes acquired by the developed technique are corroborated using subjective as well as objective analysis, which shows superior performance against other SOTA techniques. Additionally, the proposed model demonstrates enhanced accuracy when applied to unseen configurations. Further, the proposed technique (MOD-CVS) attained adequate efficiency

关键词： Object detection Feature extraction Videos Surveillance Visualization Transfer learning Lighting Artificial neural networks Encoding Deep neural network background subtraction transfer learning encoder-decoder architecture feature pooling framework

来源：评论

学校读者我要写书评

暂无评论

Automatic image captioning system based on augmentation and ranking mechanism

引用

SIGNAL IMAGE AND VIDEO PROCESSING 2024年第1期18卷 265-274页

作者： Revathi, B. S. Kowshalya, A. Meena Govt Coll Technol Coimbatore India

Research on automatically producing syntactically and semantically accurate captions is still an open challenge. This paper proposes an effective pretrained Augmentation-Ranking (A-R) Image Captioning model. The proposed model improves the properties of the images and produces appropriate captions. The employed novel augmentation strategy improves convolution neural network (CNN) operation, while Ranking and Feedback Propagation improve Long Short-Term Memory (LSTM). Our proposed model seeks to address the issues of complexity, vanishing gradients and context during training. The proposed A-R model improves the performance of LSTM and CNN. The image dataset for training is expanded using the augmented CNN. Through ranks, the Ranking LSTM aids in the identification of the semantic captions. This blending method enhances the working of image captioning system. Utilizing greedy and beam search, the proposed A-R model is examined under maximum and average pooling. The outcomes are compared with cutting-edge models such as the bidirectional recurrent neural network, Google NIC and Bi-LSTM combined with semantic attention mechanism. The proposed model is assessed using the Flickr 8 k and Flickr 30 k dataset and assessed using measures including BLEU, METEOR and CIDER. The proposed model with reduced complexity generated captions deemed accurate, syntactically correct and semantically correct by achieving an accuracy of 74.87% above all baseline models, according to experimental results.

关键词： Image captioning Deep neural network encoder-decoder architecture Ranking LSTM

来源：评论

学校读者我要写书评

暂无评论

Liver tumor segmentation from computed tomography images using multiscale residual dilated encoder-decoder network

引用

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2022年第2期32卷 600-613页

作者： Tummala, Bindu Madhavi Barpanda, Soubhagya Sankar VIT AP Univ Sch Comp Sci & Engn Amaravati Andhra Pradesh India

In this paper, an encoder-decoder-based architecture, which segments liver tumors with a two-step training process is proposed. Accurate liver tumor segmentation from CT images is still a major problem that impacts the diagnosis process. Heterogeneous densities, shapes, and unclear boundaries make tumor extraction challenging. First, the proposed network segments the liver, and then tumors are extracted from the liver ROIs. We have scaled down the images into different resolutions at each scale and applied normal convolutions along with the dilations and residual connections to capture broad conceptual information without data loss. MDICE, a combined loss function is used to enhance the learning capability and the 3D-IRCADb1 dataset is considered for training and testing because of its tumor complexities. The segmentation quality metrics DICE, MDICE are analyzed on the 3D-IRCADb1 dataset and obtained 0.98 and 0.65 accuracies per case for liver and tumor segmentation respectively, and found improvement over U-Net and other variants.

关键词： dilated convolutions encoder-decoder architecture liver tumor segmentation medical imaging semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

BIPOOLNET: An advanced UNet architecture for enhanced lane detection in autonomous vehicles

引用

INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS 2024年第2期18卷 743-757页

作者： Santhiya, P. Jebadurai, Immanuel JohnRaja Paulraj, Getzi Jeba Leelipushpam Jenefa, A. Karunya Inst Technol & Sci Sch Comp Sci & Tech Coimbatore Tamil Nadu India

In the rapidly evolving landscape of autonomous vehicle technology, the imperative to bolster safety within smart urban ecosystems has never been more critical. This endeavor requires the deployment of advanced detection systems capable of navigating the intricacies of pedestrian, near-vehicle, and lane detection challenges, with a particular focus on the nuanced requirements of curved lane navigation - a domain where traditional AI models exhibit notable deficiencies. This paper introduces BIPOOLNET, an innovative encoder-decoder neural architecture, ingeniously augmented with a feature pyramid to facilitate the precise delineation of curved lane geometries. BIPOOLNET integrates max pooling and average pooling to extract critical features and mitigate the complexity of the feature map, redefining the benchmarks for lane detection technology. Rigorous evaluation using the TuSimple dataset underscores BIPOOLNET's exemplary performance, evidenced by an unprecedented accuracy rate of 98.45%, an F1-score of 98.17%, and notably minimal false positive (1.84%) and false negative (1.09%) rates. These findings not only affirm BIPOOLNET's supremacy over extant models but also signal a paradigm shift in enhancing the safety and navigational precision of autonomous vehicles, offering a scalable, robust solution to the multifaceted challenges posed by real-world driving dynamics.

关键词： Lane detection autonomous vehicles encoder-decoder architecture feature pyramid enhancement BIPOOLNET

来源：评论

学校读者我要写书评

暂无评论

A data-driven approach for regional-scale fine-resolution disaster impact prediction under tropical cyclones

引用

NATURAL HAZARDS 2024年第8期120卷 7461-7479页

作者： Lin, Peihui Wang, Naiyu Zhejiang Univ Coll Civil Engn & Architecture Hangzhou Peoples R China

Tropical cyclones (TCs) pose a significant threat to coastal regions worldwide, demanding accurate and timely predictions of potential disaster impacts. Existing regional-scale impact prediction models, however, are largely limited by the sparsity of modeling data and incapability of fine-resolution predictions in a computationally efficient manner, thus hindering real-time identification of potential disaster hotspots. To address these limitations, we present a data-driven image-to-image TC impact prediction model based on a deep convolutional neural network (CNN) for Zhejiang Province, China, an area of approximately 105,000 km2 consisting of 90 counties. The proposed model utilizes twelve carefully selected predictors, including hazard, environmental and vulnerability factors, which are processed into province-scale 1 km-grid image-format data. An end-to-end encoder-decoder architecture is subsequently designed to extract impact-relevant spatial features from the multi-channel input images, then to construct a spatial impact map of identical size (i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sim}105,000$$\end{document} km2) and resolution (i.e.,1 km-grid). This gridded impact map is then aggregated spatially to derive county-level impact predictions, which serve as the final layer of the CNN model and are used to evaluate the model's loss function in terms of mean squared error. This design is informed by the fact that the training data on TC impact, collected from historical events, were recorded at county level. Validation and error analysis demonstrate the model's promising spatial accuracy and time efficacy. Furthermore, an illustration of the model's application with Typhoon Lekima in 2019 underscores its potential for integrating meteorological forecasts to achieve re

关键词： Tropical cyclone Convolutional neural network (CNN) encoder-decoder architecture Fine-resolution prediction Regional impact Resilience

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：