Solar photovoltaic (PV) energy, with its clean, local, and renewable features, is an effective complement to traditional energy sources today. However, the photovoltaic power system is highly weather-dependent and the...
详细信息
Solar photovoltaic (PV) energy, with its clean, local, and renewable features, is an effective complement to traditional energy sources today. However, the photovoltaic power system is highly weather-dependent and therefore has unstable and intermittent characteristics. Despite the negative impact of these features on solar sources, the increase in worldwide installed PV capacity has made solar energy prediction an important research topic. This study compares three encoder-decoder (ED) networks for day-ahead solar PV energy prediction: Long Short-Term Memory ED (LSTM-ED), Convolutional LSTM ED (Conv-LSTM-ED), and Convolutional Neural Network and LSTM ED (CNN-LSTM-ED). The models are tested using 1741-day-long datasets from 26 PV panels in Istanbul, Turkey, considering both power and energy output of the panels and meteorological features. The results show that the Conv-LSTM-ED with 50 iterations is the most successful model, achieving an average prediction score of up to 0.88 over R-square (R2). Evaluation of the iteration counts' effect reveals that the Conv-LSTM-ED with 50 iterations also yields the lowest Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) values, confirming its success. In addition, the fitness and effectiveness of the models are evaluated, with the Conv-LSTM-ED achieving the lowest Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for each iteration. The findings of this work can help researchers build the best data-driven methods for forecasting PV solar energy based on PV features and meteorological features.
Image caption generation is a critical research area that combines computer vision and natural language processing, with wide-ranging implications such as assisting visually impaired individuals, improving autonomous ...
详细信息
Skin lesion segmentation from dermatoscopic images is essential for the diagnosis of skin cancer. However, it is still a challenging task due to the ambiguity of the skin lesions, the irregular shape of the lesions an...
详细信息
Skin lesion segmentation from dermatoscopic images is essential for the diagnosis of skin cancer. However, it is still a challenging task due to the ambiguity of the skin lesions, the irregular shape of the lesions and the presence of various interfering factors. In this paper, we propose a novel Ambiguous Context Enhanced Attention Network (ACEANet) based on the classical encoder-decoder architecture, which is able to accurately and reliably segment a variety of lesions with efficiency. Specifically, a novel Ambiguous Context Enhanced Attention module is embedded in the skip connection to augment the ambiguous boundary information. A Dilated Gated Fusion block is employed in the end of the encoding phase, which effectively reduces the loss of spatial location information due to continuous downsampling. In addition, we propose a novel Cascading Global Context Attention to fuse feature information generated by the encoder with features generated by the decoder of the corresponding layer. In order to verify the effectiveness and advantages of the proposed network, we have performed comparative experiments on ISIC2018 dataset and PH2 dataset. Experiments results demonstrate that the proposed model has superior segmentation performance for skin lesions.
Accurate and reliable crop type identification from satellite images provides a foundation for crop yield predictions which paves the way to help ensure food security. Most of the work done in the field of crop type m...
详细信息
Accurate and reliable crop type identification from satellite images provides a foundation for crop yield predictions which paves the way to help ensure food security. Most of the work done in the field of crop type mapping using remote sensing is restricted to the developed countries having large field parcels, while a little effort has been directed towards doing so for developing countries, where this task becomes more challenging due to the small size of field parcels, irregular shapes of the fields, and an acute shortage of labelled datasets for training supervised machine learning models. In this research, we try to fill this gap in the literature by exploring the feasibility of performing the semantic segmentation of agricultural fields from satellite images by proposing an encoder-decoder-based semantic segmentation architecture, CropNet, with a ResNet network as the encoder backbone and the use of attention modules in the decoder to allow the model to focus on more important portions of the feature maps and the feature fusion to concatenate the feature maps from all the decoder nodes getting a more precise prediction by bringing the spatial location information from the previous layers. The architecture outperformed the state of the art by 0.51% and 1.3%, on overall accuracy and macro-F1 score, respectively, after being trained on the "2019 Zindi's Farm Pin Crop Detection" dataset of Sentinel-2 images. The model achieved a field-wise overall classification accuracy of 78.06% and macro-F1 score of 67.3% and a pixel-wise segmentation mean Intersection over Union (mIoU) of 62.22% which is an improvement of 2.56% over the state-of-the-art methods, thereby demonstrating that our model is computationally efficient for the job of semantic segmentation of crop types from the satellite images in the difficult scenario of smallholder farms.
Video saliency prediction aims to resemble human visual attention by identifying the most relevant and significant elements in a video frame or sequence. This task becomes notably intricate in scenarios characterized ...
详细信息
Video saliency prediction aims to resemble human visual attention by identifying the most relevant and significant elements in a video frame or sequence. This task becomes notably intricate in scenarios characterized by dynamic elements such as rapid motion, occlusions, blur, background variations, and nonrigid deformations. Therefore, the inherent complexity of human visual attention behavior during dynamic scenes necessitates the assessment of both temporal and spatial data. Existing video saliency frameworks often falter under such conditions, and relying solely on image saliency models neglects crucial temporal information in videos. This study presents a new Video Salient Object Detection via Multi-level Spatiotemporal Bidirectional Network using Multi-scale Transfer Learning (MSB-Net) to address the problem of identifying significant objects in videos. The proposed MSB-Net achieves notable results for a given sequence of frames by employing multi-scale transfer learning with an encoder and decoder approach to acquire knowledge and saliency map attributes spatially and temporally. The proposed MSB-Net model has bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) components. The VGG16 (Video Geometry Group) and VGG19 architectures extract multi-scale features from the input video frames. Evaluation of diverse datasets, namely DAVIS-T, SegTrack-V2, ViSal, VOS-T, and DAVSOD-T, demonstrates the model's effectiveness, outperforming other competitive models based on parameters such as MAE, F-measure, and S-measure.
Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect local changes, and the system could be utilized to face many real-life challenges. Mos...
详细信息
Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect local changes, and the system could be utilized to face many real-life challenges. Most of the existing methods have addressed the problems of moderate and fast-moving object detection. However, very few literature have addressed the issues of slow moving object detection and these methods need further improvement to enhance the efficacy of detection. Hence, within this article, our significant endeavor involved identifying moving objects in challenging videos through an encoder-decoder architectural design, incorporating an enhanced VGG-19 model alongside a feature pooling framework. The proposed algorithm has various folds of novelties: a pre-trained VGG-19 architecture is modified and is used as an encoder with a transfer learning mechanism. The proposed model learns the weights of the improved VGG-19 model by a transfer-learning mechanism which enhances the model's efficacy. The proposed encoder is designed using a smaller number of layers to extract crucial fine and coarse scale features necessary for detecting the moving objects. The feature pooling framework (FPF) employed is a hybridization of a max-pooling layer, a convolutional layer, and multiple convolutional layers with distinct sampling rates to retain the multi-scale and multi-dimensional features at different scales. The decoder network consists of stacked convolution layers projecting from feature to image space effectively. The developed technique's efficacy is demonstrated against thirty-six state-of-the-art (SOTA) methods. The outcomes acquired by the developed technique are corroborated using subjective as well as objective analysis, which shows superior performance against other SOTA techniques. Additionally, the proposed model demonstrates enhanced accuracy when applied to unseen configurations. Further, the proposed technique (MOD-CVS) attained adequate efficiency
Research on automatically producing syntactically and semantically accurate captions is still an open challenge. This paper proposes an effective pretrained Augmentation-Ranking (A-R) Image Captioning model. The propo...
详细信息
Research on automatically producing syntactically and semantically accurate captions is still an open challenge. This paper proposes an effective pretrained Augmentation-Ranking (A-R) Image Captioning model. The proposed model improves the properties of the images and produces appropriate captions. The employed novel augmentation strategy improves convolution neural network (CNN) operation, while Ranking and Feedback Propagation improve Long Short-Term Memory (LSTM). Our proposed model seeks to address the issues of complexity, vanishing gradients and context during training. The proposed A-R model improves the performance of LSTM and CNN. The image dataset for training is expanded using the augmented CNN. Through ranks, the Ranking LSTM aids in the identification of the semantic captions. This blending method enhances the working of image captioning system. Utilizing greedy and beam search, the proposed A-R model is examined under maximum and average pooling. The outcomes are compared with cutting-edge models such as the bidirectional recurrent neural network, Google NIC and Bi-LSTM combined with semantic attention mechanism. The proposed model is assessed using the Flickr 8 k and Flickr 30 k dataset and assessed using measures including BLEU, METEOR and CIDER. The proposed model with reduced complexity generated captions deemed accurate, syntactically correct and semantically correct by achieving an accuracy of 74.87% above all baseline models, according to experimental results.
In this paper, an encoder-decoder-based architecture, which segments liver tumors with a two-step training process is proposed. Accurate liver tumor segmentation from CT images is still a major problem that impacts th...
详细信息
In this paper, an encoder-decoder-based architecture, which segments liver tumors with a two-step training process is proposed. Accurate liver tumor segmentation from CT images is still a major problem that impacts the diagnosis process. Heterogeneous densities, shapes, and unclear boundaries make tumor extraction challenging. First, the proposed network segments the liver, and then tumors are extracted from the liver ROIs. We have scaled down the images into different resolutions at each scale and applied normal convolutions along with the dilations and residual connections to capture broad conceptual information without data loss. MDICE, a combined loss function is used to enhance the learning capability and the 3D-IRCADb1 dataset is considered for training and testing because of its tumor complexities. The segmentation quality metrics DICE, MDICE are analyzed on the 3D-IRCADb1 dataset and obtained 0.98 and 0.65 accuracies per case for liver and tumor segmentation respectively, and found improvement over U-Net and other variants.
In the rapidly evolving landscape of autonomous vehicle technology, the imperative to bolster safety within smart urban ecosystems has never been more critical. This endeavor requires the deployment of advanced detect...
详细信息
In the rapidly evolving landscape of autonomous vehicle technology, the imperative to bolster safety within smart urban ecosystems has never been more critical. This endeavor requires the deployment of advanced detection systems capable of navigating the intricacies of pedestrian, near-vehicle, and lane detection challenges, with a particular focus on the nuanced requirements of curved lane navigation - a domain where traditional AI models exhibit notable deficiencies. This paper introduces BIPOOLNET, an innovative encoder-decoder neural architecture, ingeniously augmented with a feature pyramid to facilitate the precise delineation of curved lane geometries. BIPOOLNET integrates max pooling and average pooling to extract critical features and mitigate the complexity of the feature map, redefining the benchmarks for lane detection technology. Rigorous evaluation using the TuSimple dataset underscores BIPOOLNET's exemplary performance, evidenced by an unprecedented accuracy rate of 98.45%, an F1-score of 98.17%, and notably minimal false positive (1.84%) and false negative (1.09%) rates. These findings not only affirm BIPOOLNET's supremacy over extant models but also signal a paradigm shift in enhancing the safety and navigational precision of autonomous vehicles, offering a scalable, robust solution to the multifaceted challenges posed by real-world driving dynamics.
Tropical cyclones (TCs) pose a significant threat to coastal regions worldwide, demanding accurate and timely predictions of potential disaster impacts. Existing regional-scale impact prediction models, however, are l...
详细信息
Tropical cyclones (TCs) pose a significant threat to coastal regions worldwide, demanding accurate and timely predictions of potential disaster impacts. Existing regional-scale impact prediction models, however, are largely limited by the sparsity of modeling data and incapability of fine-resolution predictions in a computationally efficient manner, thus hindering real-time identification of potential disaster hotspots. To address these limitations, we present a data-driven image-to-image TC impact prediction model based on a deep convolutional neural network (CNN) for Zhejiang Province, China, an area of approximately 105,000 km2 consisting of 90 counties. The proposed model utilizes twelve carefully selected predictors, including hazard, environmental and vulnerability factors, which are processed into province-scale 1 km-grid image-format data. An end-to-end encoder-decoder architecture is subsequently designed to extract impact-relevant spatial features from the multi-channel input images, then to construct a spatial impact map of identical size (i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sim}105,000$$\end{document} km2) and resolution (i.e.,1 km-grid). This gridded impact map is then aggregated spatially to derive county-level impact predictions, which serve as the final layer of the CNN model and are used to evaluate the model's loss function in terms of mean squared error. This design is informed by the fact that the training data on TC impact, collected from historical events, were recorded at county level. Validation and error analysis demonstrate the model's promising spatial accuracy and time efficacy. Furthermore, an illustration of the model's application with Typhoon Lekima in 2019 underscores its potential for integrating meteorological forecasts to achieve re
暂无评论