In recent years, the use of wireless sensor networks has become increasingly widespread. Because of the instability of wireless networks, packet loss occasionally occurs. To reduce the impact of packet loss on data in...
详细信息
In recent years, the use of wireless sensor networks has become increasingly widespread. Because of the instability of wireless networks, packet loss occasionally occurs. To reduce the impact of packet loss on data integrity, we take advantage of the deep neural network's excellent ability to understand natural data and propose a data repair method based on a deep convolutional neural network with an encoder-decoder architecture. Compared with common interpolation algorithms and compressed sensing algorithms, this method obtains better repair results, is suitable for a wider range of applications, and does not need prior knowledge. This method adopts measures such as preparing training set data as well as the design and optimization of loss functions to achieve faster convergence speed, higher repair accuracy, and better stability. To fairly compare the repair performance of different signals, the mean squared error, relative peak-to-peak average error, and relative peak-to-peak max error are adopted to quantitatively evaluate the repair results of different signals. Comparative experiments prove that this method has better data recovery performance than traditional interpolation and compressed sensing algorithms.
Removing hair from digital dermoscopy images is occasionally a necessary step before further analysis is applied to the images. This work considers two machine learning approaches that segment the hair pixels from der...
详细信息
Removing hair from digital dermoscopy images is occasionally a necessary step before further analysis is applied to the images. This work considers two machine learning approaches that segment the hair pixels from dermoscopy images. Subsequently, morphological post-processing is applied to refine the segmented hair and an image inpainting algorithm replaces the hair pixels with values based on the surrounding image structures. The first hair segmentation approach combines pixel-wise features extracted using the well-known Gaussian image pyramid with a traditional shallow multilayer perceptron (MLP-ANN), to detect hair pixels in images. The second approach uses a deep neural convolutional encoder – decoder (ED) network to segment hair. Both hair segmentation methods (MLP-ANN and ED) are trained with a set of 32 dermoscopy images with manually annotated hair, whereas the MLP-ANN dataset is constructed in a pixel-wise manner. Both proposed methods underwent three different assessments. First a set of 50 images with a-priori known hair is used for hair segmentation evaluation. Secondly, a set of 13 different dermoscopy images with hair added using a suitably trained Generative Adversarial Network -GAN- are used to assess the quality of hair removal that generates the hair-free image, in terms of several error metrics with respect to the original hair-free image. Finally, both proposed hair segmentation methods (MLP-ANN and ED) are applied on a set of 200 hair and hair-free images, which is used for training an image classifier to recognize melanoma against nevi lesions and the improvement in the image classification accuracy is measured. Comparative results against several other state-of-the-art hair removal techniques are also presented. Results show that in terms of hair removal, both the proposed hair removal techniques outperform the best performing of the state-of-the-art methods under comparison, in terms of several error metrics. Considering the effect of hair re
Multi-stage attack is a kind of sophisticated intrusion strategy that has been widely used for penetrating the well protected network infrastructures. To detect such attacks, state-of-theart research advocates the use...
详细信息
Multi-stage attack is a kind of sophisticated intrusion strategy that has been widely used for penetrating the well protected network infrastructures. To detect such attacks, state-of-theart research advocates the use of hidden markov model (HMM). However, despite the HMM can model the relationships and dependencies among different alerts and stages for detection, they cannot handle well the stage dependencies buried in a longer sequence of alerts. In this paper, we tackle the challenge of the stages' long-term dependency and propose a new detection solution using a sequence-to-sequence (seq2seq) model. The basic idea is to encode a sequence of alerts (i.e., detector's observation) into a latent feature vector using a long-short term memory (LSTM) network and then decode this vector to a sequence of predicted attacking stages with another LSTM. By the encoder-decoder collaboration, we can decouple the local constraint between the observed alerts and the potential attacking stages, and thus able to take the full knowledge of all the alerts for the detection of stages in a sequence basis. By the LSTM, we can learn to "forget" irrelevant alerts and thereby have more opportunities to "remember" the long-term dependency between different stages for our sequence detection. To evaluate our model's effectiveness, we have conducted extensive experiments using four public datasets, all of which include simulated or re-constructed samples of real-world multi-stage attacks in controlled testbeds. Our results have successfully confirmed the better detection performance of our model compared with the previous HMM solutions. (c) 2021 Elsevier Ltd. All rights reserved.
Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FC...
详细信息
Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://***/beargolden/DP-LinkNet.
Neural response generation is to generate human-like response given human utterance by using a deep learning. In the previous studies, expressing emotion in response generation improve user performance, user engagemen...
详细信息
ISBN:
(纸本)9781728160344
Neural response generation is to generate human-like response given human utterance by using a deep learning. In the previous studies, expressing emotion in response generation improve user performance, user engagement, and user satisfaction. Also, the conversational agents can communicate with users at the human level. However, the previous emotional response generation model cannot understand the subtle part of emotions, because this model use the desired emotion of response as a token form. Moreover, this model is difficult to generate natural responses related to input utterance at the content level, since the information of input utterance can be biased to the emotion token. To overcome these limitations, we propose an emotional response generation model which generates emotional and natural responses by using the emotion feature extraction. Our model consists of two parts: Extraction part and Generation part. The extraction part is to extract the emotion of input utterance as a vector form by using the pre-trained LSTM based classification model. The generation part is to generate an emotional and natural response to the input utterance by reflecting the emotion vector from the extraction part and the thought vector from the encoder. We evaluate our model on the emotion-labeled dialogue dataset: DailyDialog. We evaluate our model on quantitative analysis and qualitative analysis: emotion classification;response generation modeling;comparative study. In general, experiments show that the proposed model can generate emotional and natural responses.
The encoder-decoder based methods for semi-supervised video object segmentation (Semi-VOS) have received extensive attention due to their superior performances. However, most of them have complex intermediate networks...
详细信息
ISBN:
(纸本)9781728163956
The encoder-decoder based methods for semi-supervised video object segmentation (Semi-VOS) have received extensive attention due to their superior performances. However, most of them have complex intermediate networks which generate strong specifiers to be robust against challenging scenarios, and this is quite inefficient when dealing with relatively simple scenarios. To solve this problem, we propose a real-time network, Clue Refining Network for Video Object Segmentation (CRVOS), that does not have any intermediate network to efficiently deal with these scenarios. In this work, we propose a simple specifier, referred to as the Clue, which consists of the previous frame's coarse mask and coordinates information. We also propose a novel refine module which shows the better performance compared with the general ones by using a deconvolution layer instead of a bilinear upsampling layer. Our proposed method shows the fastest speed among the existing methods with a competitive accuracy. On DAVIS 2016 validation set, our method achieves 63.5 fps and J&F score of 81.6%.
Although autonomous driving have become applicable to the industry, the prevalent application of key techniques to the autonomous vehicles still needs to be refined. For instance, how to fast and accurately segment ro...
详细信息
Although autonomous driving have become applicable to the industry, the prevalent application of key techniques to the autonomous vehicles still needs to be refined. For instance, how to fast and accurately segment road markings in order to assist the next pedestrian path prediction and the creation of high-definition (HD) map respectively is useful for autonomous driving to be more practical. Current road marking segmentation mainly rely on the techniques of semantic segmentation of computer vision with encoder-decoder architecture. However, as demonstrated in this paper, the upsampling layer of convolutional neural networks with encoder-decoder architecture plays a significant role in the efficiency and accuracy of the road marking segmentation. The bilinear upsampling layer is fast due to its intrinsic simple interpolation but with less accuracy;on the contrary, the upsampling layer with offsets is relatively accurate but with more computational cost. Therefore, at least, in terms of prevalent application, efficiency, and accuracy, the upsampling layer of decoder of convolution neural networks should be paid more attention to for the next research work of autonomous driving. Copyright (C) 2020 The Authors.
Concrete deck delamination often demonstrates strong variations in size, shape, and temperature distribution under the influences of outdoor weather conditions. The strong variations create challenges for pure analyti...
详细信息
Concrete deck delamination often demonstrates strong variations in size, shape, and temperature distribution under the influences of outdoor weather conditions. The strong variations create challenges for pure analytical solutions in infrared image segmentation of delaminated areas. The recently developed supervised deep learning approach demonstrated the potentials in achieving automatic segmentation of RGB images. However, its effectiveness in segmenting thermal images remains under-explored. The main challenge lies in the development of specific models and the generation of a large range of labeled infrared images for training. To address this challenge, a customized deep learning model based on encoder-decoder architecture is proposed to segment the delaminated areas in thermal images at the pixel level. Data augmentation strategies were implemented in creating the training data set to improve the performance of the proposed model. The deep learning generated model was deployed in a real-world project to further evaluate the model's applicability and robustness. The results of these experimental studies supported the effectiveness of the deep learning model in segmenting concrete delamination areas from infrared images. It also suggested that data augmentation is a helpful technique to address the small size issue of training samples. The field test with validation further demonstrated the generalizability of the proposed framework. Limitations of the proposed approach were also briefed at the end of the paper.
Blood vessel segmentation is an important step in the automated diagnosis of ophthalmic disease from retinal fundus images. The UNet is a popular encoder-decoder architecture widely used in biomedical pixel-wise segme...
详细信息
ISBN:
(纸本)9781510638297
Blood vessel segmentation is an important step in the automated diagnosis of ophthalmic disease from retinal fundus images. The UNet is a popular encoder-decoder architecture widely used in biomedical pixel-wise segmentation problems. In this paper, we analyze how the UNet can be used in a more computationally efficient way. Pre-trained weights are used to initialize the network and 3 different architectures are used to compare and analyze the efficacy of the models in terms of both computational cost and performance. Three different deep architectures (VGG16, ResNet34, DenseNet121) are discussed and their efficiencies are compared for the blood vessel segmentation task. Resnet34 architecture achieved highest sensitivity of 0.849 and accuracy and specificity of 0.961, 0.9843 with number of parameters as low as 510178 compared to normal UNet with 34525168 parameters and a sensitivity of 0.756.
Utilizing channel-wise spatial attention mechanisms to emphasize special parts of an input image is an effective method to improve the performance of convolutional neural networks (CNNs). There are multiple effective ...
详细信息
ISBN:
(纸本)9781728143002
Utilizing channel-wise spatial attention mechanisms to emphasize special parts of an input image is an effective method to improve the performance of convolutional neural networks (CNNs). There are multiple effective implementations of attention mechanism. One is adding squeeze-and-excitation (SE) blocks to the CNN structure that selectively emphasize the most informative channels and suppress the relatively less informative channels by taking advantage of channel dependence. Another method is adding convolutional block attention module (CBAM) to implement both channel-wise and spatial attention mechanisms to select important pixels of the feature maps while emphasizing informative channels. In this paper, we propose an encoder-decoder architecture based on the idea of letting the channel-wise and spatial attention blocks share the same latent space representation. Instead of separating the channel-wise and spatial attention modules into two independent parts in CBAM, we combine them into one encoder-decoder architecture with two outputs. To evaluate the performance of the proposed algorithm, we apply it to different CNN architectures and test it on image classification and semantic segmentation. Through comparing the resulting structure equipped with MEDA blocks against other attention module, we show that the proposed method achieves better performance across different test scenarios.
暂无评论