This study presents a deep learning (DL)-based approach to the seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our seismic velocity inversion network (SVInv...
详细信息
This study presents a deep learning (DL)-based approach to the seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our seismic velocity inversion network (SVInvNet) introduces a novel architecture that contains a multiconnection encoder-decoder structure enhanced with dense blocks. This design is tuned to effectively process time series data, which is essential for addressing the challenges of nonlinear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multilayered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes. SVInvNet is trained on datasets ranging from 750 to 6000 samples and is tested using a large benchmark dataset of 12 000 samples. Despite its fewer parameters compared to the baseline model, SVInvNet achieves superior performance with this dataset. The performance of SVInvNet was further evaluated using the OpenFWI dataset and Marmousi-derived velocity models. The comparative analysis clearly reveals the effectiveness of the proposed architecture.
Crack segmentation is of great significance in automatic pavement crack detection based on image recognition. Although recent convolutional neural network (CNN)-based segmentation methods have shown promising performa...
详细信息
Crack segmentation is of great significance in automatic pavement crack detection based on image recognition. Although recent convolutional neural network (CNN)-based segmentation methods have shown promising performance, accurate pavement crack segmentation still faces some challenges, such as various crack sizes, class imbalance issues, and background interference. To overcome these challenges, a compact two-stage pavement crack segmentation network based on encoder-decoder architecture (TSPCS-Net) is proposed, which includes a classification network and a segmentation network. The classification network, consisting of a feature extraction module transferred from the segmentation network and a lightweight feature fusion module, is used to quickly classify and eliminate crack-free images that existed in large numbers in actual pavement image datasets. The segmentation network is constructed based on an encoder-decoder architecture for precise pixel-level segmentation of the samples determined as crack images. Specifically, to extract multi-scale crack features, a novel multi-scale encoder module is designed by combining dilated convolution and residual structure. Then, a left-side path (LSP) is designed to alleviate the influence of class imbalance on feature extraction. Finally, an attention module with high-dimensional features guiding low-dimensional features (AM-HGL) is proposed to focus on crack-relevant features and suppress interference information. The effectiveness of the proposed TSPCS-Net is validated on a self-made unmanned aerial vehicles pavement crack (UAVPC) dataset and two public pavement distress datasets, and extensive experiments show that the proposed method outperforms current state-of-the-art methods in terms of segmentation performance and efficiency, which can meet the needs of pavement crack segmentation in practical application scenarios.
With the exponential increase in heart disease cases, it is essential to construct models (algorithms) that can be used to delineate Electrocardiogram (ECG/EKG) wave components. ECG delineation is the process of attai...
详细信息
This study proposes a new predictive segmentation method for liver tumors detection using computed tomography (CT) liver images. In the medical imaging field, the exact localization of metastasis lesions after acquisi...
详细信息
This study proposes a new predictive segmentation method for liver tumors detection using computed tomography (CT) liver images. In the medical imaging field, the exact localization of metastasis lesions after acquisition faces persistent problems both for diagnostic aid and treatment effectiveness. Therefore, the improvement in the diagnostic process is substantially crucial in order to increase the success chance of the management and the therapeutic follow-up. The proposed procedure highlights a computerized approach based on an encoder-decoder structure in order to provide volumetric analysis of pathologic tumors. Specifically, we developed an automatic algorithm for the liver tumors defect segmentation through the Seg-Net and U-Net architectures from metastasis CT images. In this study, we collected a dataset of 200 pathologically confirmed metastasis cancer cases. A total of 8,297 CT image slices of these cases were used developing and optimizing the proposed segmentation architecture. The model was trained and validated using 170 and 30 cases or 85% and 15% of the CT image data, respectively. Study results demonstrate the strength of the proposed approach that reveals the superlative segmentation performance as evaluated using following indices including F1-score = 0.9573, Recall = 0.9520, IOU = 0.9654, Binary cross entropy = 0.0032 and p-value <0.05, respectively. In comparison to state-of-the-art techniques, the proposed method yields a higher precision rate by specifying metastasis tumor position.
This study presents an autonomous fault detection method for a wide range of common failures and defects which are visually visible on PV modules. In this paper, we focus especially on detection of bird's drops as...
详细信息
This study presents an autonomous fault detection method for a wide range of common failures and defects which are visually visible on PV modules. In this paper, we focus especially on detection of bird's drops as a very typical defect on the PV modules. As a crucial prerequisite, a data-set of aerial imageries of the PV strings affected by bird's drops were collected through several experimental flight by multi-copters in order to train an accurate fully convolutional deep network. These images are divided into three groups, namely, training, testing, and validation parts. For the purpose of bird's drops segmentation, an improved encoder-decoder architecture is employed. In this regard, a modified VGG16 model is used as a backbone for the encoder part. The encoder of the network has a very flexible architecture that can be modified and trained for any other visual failure detection. Later on, extracted feature maps of images are imported into a decoder network to map the low resolution features to full resolution ones for pixel-wise segmentation. In addition, an image object positioning algorithm is presented to find the exact position of detected failures in local coordinate system. In a post-processing step, the detected damages are prioritized based on various parameters such as severity of shading and extent of impact on the PV module's output current. For further validation, different affected PV modules were characterized according to the output patterns of the classification step in order to accurately evaluate the effect of birds' drops and consequent shading on the parameters of PV modules based on their severity and location. Finally, the training and testing results demonstrate that the proposed FCN network is able to predict precisely covered pixels by bird's drops on PV modules at pixel level with average accuracies of 98% and 93% for training and testing, respectively.
Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We deve...
详细信息
ISBN:
(纸本)9783031063817;9783031063800
Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel and spatial attention integrated within each convolution block. A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance. Our model did not leverage any readily available backbone architecture, pre-training on a similar external dataset, or any of the transfer learning techniques. The total number of network parameters being around 5 million made it a significantly lightweight model as compared with the available state-of-the-art models used for the foot ulcer segmentation task. Our experiments presented results at the patch-level and image-level. Applied on publicly available Foot Ulcer Segmentation (FUSeg) Challenge dataset from MICCAI 2021, our model achieved state-of-theart image-level performance of 88.22% in terms of Dice similarity score and ranked second in the official challenge leader-board. We also showed an extremely simple solution that could be compared against the more advanced architectures.
Accurate wind power predictions (WPPs) are highly significant to the safety, stability, and economic operation of power systems. The reported encoder--decoderarchitectures have demonstrated clear advantages over trad...
详细信息
Accurate wind power predictions (WPPs) are highly significant to the safety, stability, and economic operation of power systems. The reported encoder--decoderarchitectures have demonstrated clear advantages over traditional methods in multi-step WPP tasks. However, the reported frameworks still have defects involving insufficient information mining abilities and low computing efficiencies. To address these shortcomings, this study proposed three improved encoder-decoder architectures, sequence-to-sequence bidirectional gated recurrent unit (SBIGRU), attention-based sequence-to-sequence Bi-GRU (ASBIGRU) and Transformer, in natural language processing for multi-step WPP. Data, including numerical weather predictions and wind powers, from 12 wind farms located in 12 different regions of China were used to validate our proposed models. The correlations between the datasets from multiple wind farms were analyzed using Pearson's correlation coefficient method to demonstrate the feasibility of our proposed models even without considering the spatial correlations. We adopted an effective strategy combining manual experience and machine grid searches to define the hyper-parameters needed to optimize the performance of our proposed models. The prediction accuracies and computational efficiencies of the reported and proposed models were compared experimentally. For prediction accuracy, the experimental results showed that, compared with existing models, Transformer, ASBIGRU and SBIGRU reduced the root mean square error by 3.21%, 1.06% and 0.88% in 16-step-ahead predictions, respectively. Furthermore, for computational efficiency, the training time of the existing model at a wind farm is 3.57 times that of Transformer. This confirmed that the Transformer model performs better in terms of prediction accuracy and computational efficiency. Our work illustrates the potential of Transformer for large-scale wind farm applications.
Timely and accurate detection of the initiation and expansion of crack is of great significance for improving safe operation of civil infrastructures. Image-based visual surface inspection has been an indispensable wa...
详细信息
ISBN:
(纸本)9781450385725
Timely and accurate detection of the initiation and expansion of crack is of great significance for improving safe operation of civil infrastructures. Image-based visual surface inspection has been an indispensable way for long-time infrastructure monitoring. However, existing crack detection methods generally suffer from the interference of complex background, leading to obvious performance drops. To tackle this, an improved encoder-decoder architecture based on SegNet is proposed in this paper, namely crack-SegNet. The encoder network hierarchically learns visual features from the original image, and the decoder network gradually up-samples and maps the encoded features to the input size for the pixel-level classification. In order to enhance the feature capacity of cracks in complex background, a channel attention mechanism is integrated into the encoder, as well as a spatial attention module in the decoder to improve the feature representation of cracks. Meanwhile, a spatial pyramid pooling is also attached to the last convolutional layer of the encoder to capture crack with different scales. To better validate the proposed method, a challenging metal surface crack dataset with much more complex background is collected. Experimental results on the datasets show that the proposed crack-SegNet outperforms other state-of-the-art crack detection methods, especially in complex background.
Optical character recognition (OCR) systems are used to convert scanned documents into text. Arabic OCR is an active area of research where high accuracy is demanding. This paper focuses on building a model for conver...
详细信息
ISBN:
(纸本)9781450389266
Optical character recognition (OCR) systems are used to convert scanned documents into text. Arabic OCR is an active area of research where high accuracy is demanding. This paper focuses on building a model for converting images that contain Arabic text into their corresponding text using a deep learning approach. This model does not require any knowledge of the underlying language and it is simply trained end-to-end on the KAFD dataset. It combines several standard neural components from vision and natural language processing. Features are extracted from images using Convolutional Neural Networks (CNNs) where the features are arranged in a grid. Each row is then encoded using a Recurrent Neural Networks (RNNs). An RNN decoder with a visual attention mechanism is used to generate the output text. Our preliminary experiments show that the presented approach is effective. The overall obtained accuracy is 89.82%. However, the individual results for some fonts are higher than this score.
Object detection in motion pictures is always a challenging task due to the presence of dynamic background. Deep learning architectures especially encoder-decoder type has shown promising performance in segmenting for...
详细信息
ISBN:
(纸本)9781728170978
Object detection in motion pictures is always a challenging task due to the presence of dynamic background. Deep learning architectures especially encoder-decoder type has shown promising performance in segmenting foreground objects against the background in video sequences. Thus, in this work, a VGG-16 based encoder-decoder architecture is investigated and several modifications are proposed to improve the efficiency the model. The modified models are evaluated on two different standard databases- CDNet 2014 and SBI2015 with various scenes and achieved the highest precision of 0.99 which is competitive in nature with the current schemes in the state-of-the-art.
暂无评论