This letter focuses on image manipulation detection which aims to recognize the manipulated regions under the contextual semantic information. Existing approaches usually overlook the semantic discrepancy between diff...
详细信息
This letter focuses on image manipulation detection which aims to recognize the manipulated regions under the contextual semantic information. Existing approaches usually overlook the semantic discrepancy between different levels of feature maps, and directly fuse (e.g., addition, or concatenation) them for detection. In this letter, we argue that the semantic gap is the main reason for the low effectiveness of feature fusion in manipulation predictions. To address this problem, we propose a Global Semantic Consistency Network (GSCNet) for image manipulation detection, which is based on an encoder-decoder structure. Specifically, to make GSCNet include more global texture information which has been empirically confirmed to be beneficial to manipulation detection, gram block is first deployed on each level of feature maps in the encoding stage. Based on that, bi-directional convolutional LSTM is further implemented on the decoding stage, such that feature maps of the same level have semantic consistency. Experimental results on NIST16, and CASIA v1.0 declare that GSCNet can accurately locate the manipulated regions. Furthermore, compared to the existing models, GSCNet can achieve new state-of-the-art results.
In this paper, we proposed an innovative encoder-decoder structure with a convolution long short-term memory (ED-ConvLSTM) network to forecast global total electron content (TEC) based on the International GNSS Servic...
详细信息
In this paper, we proposed an innovative encoder-decoder structure with a convolution long short-term memory (ED-ConvLSTM) network to forecast global total electron content (TEC) based on the International GNSS Service (IGS) TEC maps from 2005 to 2018 with 1-hr time cadence. The ED-ConvLSTM model is used to forecast TEC maps 1-7 days in advance through iterations. To investigate the model's performance, we compared the model with International Reference Ionosphere (IRI2016) model in 2014 and 2018, and compared the model with 1-day Beijing University of Aeronautics and Astronautics (BUAA) model in 2018. The results show that our 7-day ED-ConvLSTM model (ED-ConvLSTM model that forecasts 7 days in advance) outperforms IRI2016 in 2014 and 2018, and our 5-day ED-ConvLSTM model (ED-ConvLSTM model that forecasts 5 days in advance) outperforms 1-day BUAA model. Furthermore, the root mean square error (RMSE) from the 1-day ED-ConvLSTM model with respect to the IGS TEC maps decreases by 51.5% and 43%, respectively, in 2014 and 2018 compared with that from IRI2016 model. The RMSE from the 1-day ED-ConvLSTM model is 20.3% lower than that from the 1-day BUAA model in 2018. In addition, our model has the highest RMSE in the Equatorial Ionospheric Anomaly (EIA) region, but can roughly predict the features and locations of EIA. However, the model fails to forecast localized TEC enhancement and the sudden ionospheric response to the geomagnetic storms. Overall, the model shows competitive performance in medium-term global TEC maps prediction during geomagnetic quiet periods.
The total electron density is a fundamental quantity in the Earth's magnetosphere and plays an important role in a number of physical processes, but its dynamic global evolution is not fully quantified yet. We pre...
详细信息
The total electron density is a fundamental quantity in the Earth's magnetosphere and plays an important role in a number of physical processes, but its dynamic global evolution is not fully quantified yet. We present an implementation of a specific type of recurrent neural network (encoder-decoder), which is distinct from previous models, to construct global electron density based on the multiyear data from Van Allen Probes. The history of geomagnetic indices is first encoded into a hidden state H, then together with auxiliary information (satellite location), they are decoded into the quantity of interest (total electron density in this study). In this process the input of historical geomagnetic indices is detangled from the satellite location and is processed chronologically by the encoder. As a result, time evolution of geomagnetic indices is explicitly embedded in the structure and the encoded hidden state H can be viewed as the representation of the inner magnetospheric state. The magnetospheric state is then decoded to predict global electron density evolution. Our results show that the model can capture the dynamical evolution of total electron density with the formation and evolution of stable and evident plume configurations that roughly agree with global observations. Our findings demonstrate the importance of applying recurrent neural networks to specify the inner magnetospheric state in a novel way, which will potentially improve our fundamental understanding of wave and particle dynamics in the Earth's magnetosphere.
作者:
Zhu, DiCheng, XimengZhang, FanYao, XinGao, YongLiu, YuPeking Univ
Sch Earth & Space Sci Inst Remote Sensing & Geog Informat Syst Beijing Peoples R China Peking Univ
Beijing Key Lab Spatial Informat Integrat & Its A Beijing Peoples R China UCL
SpaceTimeLab Dept Civil Environm & Geomat Engn London England MIT
Senseable City Lab 77 Massachusetts Ave Cambridge MA 02139 USA
Spatial interpolation is a traditional geostatistical operation that aims at predicting the attribute values of unobserved locations given a sample of data defined on point supports. However, the continuity and hetero...
详细信息
Spatial interpolation is a traditional geostatistical operation that aims at predicting the attribute values of unobserved locations given a sample of data defined on point supports. However, the continuity and heterogeneity underlying spatial data are too complex to be approximated by classic statistical models. Deep learning models, especially the idea of conditional generative adversarial networks (CGANs), provide us with a perspective for formalizing spatial interpolation as a conditional generative task. In this article, we design a novel deep learning architecture named conditional encoder-decoder generative adversarial neural networks (CEDGANs) for spatial interpolation, therein combining the encoder-decoder structure with adversarial learning to capture deep representations of sampled spatial data and their interactions with local structural patterns. A case study on elevations in China demonstrates the ability of our model to achieve outstanding interpolation results compared to benchmark methods. Further experiments uncover the learned spatial knowledge in the model's hidden layers and test the potential to generalize our adversarial interpolation idea across domains. This work is an endeavor to investigate deep spatial knowledge using artificial intelligence. The proposed model can benefit practical scenarios and enlighten future research in various geographical applications related to spatial prediction.
We consider referring image segmentation. It is a problem at the intersection of computer vision and natural language understanding. Given an input image and a referring expression in the form of a natural language se...
详细信息
We consider referring image segmentation. It is a problem at the intersection of computer vision and natural language understanding. Given an input image and a referring expression in the form of a natural language sentence, the goal is to segment the object of interest in the image referred by the linguistic query. To this end, we propose a dual convolutional LSTM (ConvLSTM) network to tackle this problem. Our model consists of an encoder network and a decoder network, where ConvLSTM is used in both encoder and decoder networks to capture spatial and sequential information. The encoder network extracts visual and linguistic features for each word in the expression sentence, and adopts an attention mechanism to focus on words that are more informative in the multimodal interaction. The decoder network integrates the features generated by the encoder network at multiple levels as its input and produces the final precise segmentation mask. Experimental results on four challenging datasets demonstrate that the proposed network achieves superior segmentation performance compared with other state-of-the-art methods.
Video summarization shortens a lengthy video into a succinct version, whose challenges mainly originate from the difficulties of discovering the inherent relations between the original video and its summary, meanwhile...
详细信息
Video summarization shortens a lengthy video into a succinct version, whose challenges mainly originate from the difficulties of discovering the inherent relations between the original video and its summary, meanwhile minimizing the semantic information loss. Supervised approaches, especially those in deep learning framework, have demonstrated their effectiveness in video summarization. However, these approaches mainly focus on one of the challenges, and seldom pay close attention to both challenges simultaneously. To this end, we propose to pay close attention to this deficiency by incorporating the ideas of both the encoder-decoder attention and semantic preserving loss in a deep Seq2Seq framework for video summarization. Moreover, we also introduce Huber loss to replace the popular mean square error loss to enhance the robustness of the model to outliers. Extensive experiments on two benchmark video summarization datasets demonstrate that the proposed approach consistently outperforms the state-of-the-art ones.
Recent single image deraining methods either use a recurrent mechanism to gradually learn the mapping between clear images and rainy images, or focus on designing various loss functions to supervise the learning proce...
详细信息
Recent single image deraining methods either use a recurrent mechanism to gradually learn the mapping between clear images and rainy images, or focus on designing various loss functions to supervise the learning process. In this letter, we propose a dually connected deraining net using pixel-wise attention, for single image rain removal. Specifically, the deraining net adopts an encoder-decoder net as a backbone, which can effectively learn a residual rain-streaks map by jointly using skip sum connection and skip concatenation connection. The dual connections enable the deraining net to promote information flow between layers, and thus can allow it to discriminate and localize the rain streaks. To preserve image details, the decoded features are weighted by the learnable pixel-wise attention for adaptively recalibrating their responses. Experimental results on synthetic datasets demonstrate that the proposed model outperforms the recent state-of-the-art deraining methods.
Sparse Dictionary Learning generates a sparse representation for images and signals along with a generalized learned dictionary. We examine closely to the constrained recurrent sparse auto-encoder (CRsAE) on its Enco...
详细信息
Sparse Dictionary Learning generates a sparse representation for images and signals along with a generalized learned dictionary. We examine closely to the constrained recurrent sparse auto-encoder (CRsAE) on its encoder-decoder plus recurrent architecture and experimenting CRsAE’s position in the classical dictionary learning problem. We further extend the visualizations, experiments, and metrics to evaluate the model in the context of both VAE and Dictionary Learning.
Image dehazing is a very important pre-processing step to many computer vision tasks such as object recognition and tracking. However, it is a challenging problem because the physical parameters of imaging, e.g. the d...
详细信息
Image dehazing is a very important pre-processing step to many computer vision tasks such as object recognition and tracking. However, it is a challenging problem because the physical parameters of imaging, e.g. the depth information of scene pixels and the attenuation model, are usually unknown. Based on a physical model, different methods have been proposed to recover these parameters. Existing convolutional neural networks (CNNs) based methods try to solve the image dehazing problem using an end-to-end network to learn a direct mapping between a hazy image and its corresponding clear image. But the representational ability, spatial variant ability and dehazing capability of these network models are hindered by treating all the spatial and channel-wise features indiscriminately. Hence, we propose an end-to-end dehazing network with a parallel spatial/channel-wise attention block for capturing more informative spatial and channel-wise features respectively. Specifically, based on the encoder-decoder framework with a pyramid pooling operation, a novel parallel spatial/channel-wise attention block is proposed and applied to the end of the encoder for guiding the decoder to reconstruct better clear images. In the spatial/channel-wise attention block, the spatial attention module and the channel-wise attention module are connected in parallel, where the spatial attention module highlights important spatial positions of features. Meanwhile, the channel-wise module exploits inter-dependencies among the channel-wise features. Extensive experiments demonstrate that our network with a parallel spatial/channel-wise attention block can achieve better accuracy and visual results over state-of-the-art methods. (C) 2020 Elsevier Ltd. All rights reserved.
Recently, rain streaks removal from a single image has attracted much research attention to alleviate the degenerated performance of computer vision tasks implemented on rainy images. In this paper, we provide a thoro...
详细信息
Recently, rain streaks removal from a single image has attracted much research attention to alleviate the degenerated performance of computer vision tasks implemented on rainy images. In this paper, we provide a thorough review for current single-image-based rain removal techniques, which can be mainly categorized into three classes: early filter-based, conventional prior-based, and recent deep learning-based approaches. Furthermore, inspired by the rationality of current deep learning-based methods and insightful characteristics underlying rain shapes, we build a specific coarse-to-fine deraining network architecture, which can finely deliver the rain structures and progressively removes rain streaks from the input image, accordingly. The superiority of the proposed network is substantiated by experiments implemented on synthetic and real rainy images both visually and quantitatively, as compared with comprehensive state-of-the-art methods along this line. Especially, it is verified that the proposed network possesses better generalization capability on real rainy images, implying its potential usefulness for this task.
暂无评论