All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural n...
详细信息
ISBN:
(纸本)9783030110185;9783030110178
All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural network based encoder-decoder architecture for embedding of images as payload. To this end, we make following three major contributions: (i) we propose a deep learning based generic encoder-decoder architecture for image steganography;(ii) we introduce a new loss function that ensures joint end-to-end training of encoder-decoder networks;(iii) we perform extensive empirical evaluation of proposed architecture on a range of challenging publicly available datasets (MNIST, CIFAR10, PASCAL-VOC12, ImageNet, LFW) and report state-of-the-art payload capacity at high PSNR and SSIM values.
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interferenc...
详细信息
ISBN:
(纸本)9781728132488
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.
We address the task of estimating depth from a single intensity image via a novel convolutional neural network (CNN) encoder-decoder architecture, which learns the depth information using example pairs of color images...
详细信息
ISBN:
(纸本)9781538662496
We address the task of estimating depth from a single intensity image via a novel convolutional neural network (CNN) encoder-decoder architecture, which learns the depth information using example pairs of color images and their corresponding depth maps. The proposed model integrates residual connections within pooling and up-sampling layers, and hourglass networks which operate on the encoded features, thus processing these at various scales. Furthermore, the model is optimized under the constraints of perceptual as well as the mean squared error loss. The perceptual loss considers the high-level features, thus operating at a different scale of abstraction, which is complementary to the mean squared error loss. The improvements in qualitative and quantitative comparisons with state-of-the-art approaches demonstrate the effectiveness of our approach, even in presence of noise.
Recent anomaly detection techniques locus on the use of neural networks and an encoder-decoder architecture. However, these techniques lead to trade offs if implemented in an embedded environment such as high heat man...
详细信息
ISBN:
(纸本)9789897583841
Recent anomaly detection techniques locus on the use of neural networks and an encoder-decoder architecture. However, these techniques lead to trade offs if implemented in an embedded environment such as high heat management, power consumption and hardware costs. This paper presents two related new methods for anomaly detection within data sets gathered from an autonomous mini-vehicle with a CAN bus. The first method which to the best of our knowledge is the first use of encoder-decoder architecture for anomaly detection using linear genetic programming (LGP). Second method uses self-configuring neural network that is created using evolutionary algorithm paradigm learning both architecture and weights suitable for embedded systems. Both approaches have the following advantages: it is inexpensive regarding resource use, can be run on almost any embedded board due to linear register machine advantages in computation. The proposed methods are also faster by at least one order of magnitude, and it includes both inference and complete training.
Semantic segmentation, as a dense pixelwise classification task, is of great significance to scene understanding. Many approaches based on convolutional neural network still suffer from two kinds of challenges: (1) in...
详细信息
Semantic segmentation, as a dense pixelwise classification task, is of great significance to scene understanding. Many approaches based on convolutional neural network still suffer from two kinds of challenges: (1) insufficient semantic information results in semantic obfuscation between similar categories, (2) loss of spatial information leads to inaccurate location of inconspicuous objects. To tackle these challenges, we design a network with an encoder-decoder architecture based on two proposed modules: global pyramid attention module (GPAM) and pyramid decoder module (PDM). Specifically, GPAM exploits an attention mechanism as global prior knowledge to adaptively capture discriminative features for enhancing semantic representation, and PDM employs small convolutions connected in parallel to predict adjacent position relationships for refining spatial information. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and our network achieves a mean intersection over union score of 83.4% on PASCAL VOC 2012 dataset and 78.5% on Cityscapes dataset. (C) 2019 SPIE and IS&T
GPS datasets in the big data regime provide rich contextual information that enable efficient implementation of advanced features such as navigation, tracking, and security in urban computing systems. Understanding th...
详细信息
GPS datasets in the big data regime provide rich contextual information that enable efficient implementation of advanced features such as navigation, tracking, and security in urban computing systems. Understanding the hidden patterns in large amount of GPS data is critically important in ubiquitous computing. The quality of GPS data is the fundamental key problem to produce high quality results. In real world applications, certain GPS trajectories are sparse and incomplete;this increases the complexity of inference algorithms. Few of existing studies have tried to address this problem using complicated algorithms that are based on conventional heuristics;this requires extensive domain knowledge of underlying applications. Our contribution in this paper are two-fold. First, we proposed deep learning based bidirectional convolutional recurrent encoder-decoder architecture to generate the missing points of GPS trajectories over occupancy grid-map. Second, we interfaced attention mechanism between enconder and decoder, that further enhance the performance of our model. We have performed the experiments on widely used Microsoft geolife trajectory dataset, and perform the experiments over multiple level of grid resolutions and multiple lengths of missing GPS segments. Our proposed model achieved better results in terms of average displacement error as compared to the state-of-the-art benchmark methods.
Convolutional neural networks (CNNs) for visual semantic segmentation have been attracting considerable attention recently because of their superior support for many significant tasks, such as autonomous driving, sema...
详细信息
ISBN:
(纸本)9781728107707
Convolutional neural networks (CNNs) for visual semantic segmentation have been attracting considerable attention recently because of their superior support for many significant tasks, such as autonomous driving, semantic SLAM (simultaneous localization and mapping) and remote sensing surveying and mapping. These kinds of applications generally need to he implemented on the smart terminals, which means that a kind of hardware platform with high energy efficiency and real-time performance is required. However, CNNs for semantic segmentation usually contain sonic, symmetrical encoders and decoders, corresponding to the down-sampling process (e.g., pooling, convolution) and the up-sampling process (e.g., unpooling, deconvolution). All of these processes are computing and storage intensive, which limits their applicability in the resource constrained embedded systems. In this paper, an FPGA-based accelerator programed by OpenCL is proposed. We evaluate its performance on the CamVid dataset. The global accuracy only drops by 2.04% with 8-bit quantization. Additionally, the system shows 48.89 GOPS and 2.4x real-time performance against CPU when running on an Arria-10 GX1150 device.
Semantic segmentation is an extremely important task in computer vision. At present, the related methods have achieved high performance. Nevertheless, Semantic segmentation still faces the challenge of localization ac...
详细信息
Semantic segmentation is an extremely important task in computer vision. At present, the related methods have achieved high performance. Nevertheless, Semantic segmentation still faces the challenge of localization accuracy due to DCNN invariance and existence of objects at multi-scale. In order to improve the accuracy of segmentation, this paper proposes a U-SEM encoderdecoder network. Firstly, in the encoding stage, it down-samples through the ResNet. Secondly, in the decoding stage, in order to filter and utilize the useful features, the SE-Mobile Block is proposed and fused to the network. The SE block adopts the idea of attention mechanism to focus on useful features and ignore those redundant features. Mobile blocks use deep separable convolutions to replace traditional convolutions, speeding up operations and reducing parameters. Finally, it adopts the skip structure where the feature information of different scales are merged to produce accurate and detailed segmentation. Experimental results show that the proposed network achieves good performance on multiple datasets which reaches the accuracy of 78.4% m IOU on PASCAL VOC 2012 and 75.7% mIOU on Cityscapes dataset.
encoderdecoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from enco...
详细信息
encoderdecoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.
Traffic flow prediction has been regarded as a key research problem in the intelligent transportation system. In this paper, we propose an encoder-decoder model with temporal attention mechanism for multi-step forward...
详细信息
ISBN:
(纸本)9781728119854
Traffic flow prediction has been regarded as a key research problem in the intelligent transportation system. In this paper, we propose an encoder-decoder model with temporal attention mechanism for multi-step forward traffic flow prediction task, which uses LSTM as the encoder and decoder to learn the long dependencies features and nonlinear characteristics of multivariate traffic flow related time series data, and also introduces a temporal attention mechanism for more accurately traffic flow prediction. Through the real traffic flow dataset experiments, it has shown that the proposed model has better prediction ability than classic shallow learning and baseline deep learning models. And the predicted traffic flow value can be well matched with the ground truth value not only under short step forward prediction condition but also under longer step forward prediction condition, which validates that the proposed model is a good option for dealing with the realtime and forward-looking problems of traffic flow prediction task.
暂无评论