Currently sketch artists are employed by the police to draw sketches of suspects based on the description given by an eye-witness. These sketches can sometimes be inaccurate due to incorrect drawings of the artist or ...
详细信息
ISBN:
(数字)9781728123394
ISBN:
(纸本)9781728123400
Currently sketch artists are employed by the police to draw sketches of suspects based on the description given by an eye-witness. These sketches can sometimes be inaccurate due to incorrect drawings of the artist or the incorrect description given by the witness. Generative Adversarial Network (GAN) is a way of training a Neural Network to output images which belong to a specific class. This network is trained by using an adversarial process which pits the generator against the discriminator in a minimax game. Traditional GANs are unable to generate high-resolution images hence, StyleGAN is used to resolve this issue. The generated images may still need to be altered to get a close match so TL-GAN is used to alter the generated image by altering the latent-space input of the StyleGAN. TL-GAN offers users the ability to finely tune one or multiple features of the face holistically. The main objective of the proposed work is to develop a Suspect Face Generation System as the sketches made by sketch artists are only 13 out of 160 times (approx. 8%) accurate. This system will help the society in reduction of misidentification of crime suspects and considerably reduce the crime rate.
As there is an increase in the usage of digital applications, the availability of data generated has increased to a tremendous scale. Data is an important component in almost every domain where research and analysis a...
详细信息
ISBN:
(数字)9781728141428
ISBN:
(纸本)9781728141435
As there is an increase in the usage of digital applications, the availability of data generated has increased to a tremendous scale. Data is an important component in almost every domain where research and analysis are required to solve the problems. It is available in a structured or unstructured format. Therefore, in order to get corresponding data as per the application's purpose, easily and quickly from different sources of data on the internet, an online content summarizer is desired. Summarizers makes it easier for users to understand the content without reading it completely. Abstractive Text Summarizer helps in defining the content by considering the important words and helps in creating summaries that are in a human-readable format. The main aim is to make summaries in such a way that it should not lose its context. Various Neural Network models are employed along with other machine translation models to bring about a concise summary generation. This paper aims to highlight and study the existing contemporary models for abstractive text summarization and also to explore areas for further research.
In recent years, deep learning approaches have gained great attention due to their superior performance and the availability of high speed computing resources. These approaches are also extended towards the real time ...
详细信息
In recent years, deep learning approaches have gained great attention due to their superior performance and the availability of high speed computing resources. These approaches are also extended towards the real time processing of multimedia content exploiting its spatial and temporal structure. In this paper, we propose a deep learning-based video description framework which first extracts visual features from video frames using deep convolutional neural networks (CNN) and then pass the derived representations into a long-short term memory-based language model. In order to capture accurate information for human presence, a fine-tuned multi-task CNN is presented. The proposed pipeline is end-to-end, trainable, and capable of learning dense visual features along with an accurate framework for the generation of natural language descriptions of video streams. The evaluation is done by calculating Metric for Evaluation of Translation with Explicit ORdering and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores between system generated and human annotated video descriptions for a carefully designed data set. The video descriptions generated by the traditional feature learning and proposed deep learning frameworks are also compared through the ROUGE scores.
In previous work on text summarization, encoder-decoder architectures and attention mechanisms have both been widely used. Attention-based encoder-decoder approaches typically focus on taking the sentences preceding a...
详细信息
ISBN:
(纸本)9781450360142
In previous work on text summarization, encoder-decoder architectures and attention mechanisms have both been widely used. Attention-based encoder-decoder approaches typically focus on taking the sentences preceding a given sentence in a document into account for document representation, failing to capture the relationships between a sentence and sentences that follow it in a document in the encoder. We propose an attentive encoder-based summarization (AES) model to generate article summaries. AES can generate a rich document representation by considering both the global information of a document and the relationships of sentences in the document. A unidirectional recurrent neural network (RNN) and a bidirectional RNN are considered to construct the encoders, giving rise to unidirectional attentive encoder-based summarization (Uni-AES) and bidirectional attentive encoder-based summarization (Bi-AES), respectively. Our experimental results show that Bi-AES outperforms Uni-AES. We obtain substantial improvements over a relevant start-of-the-art baseline.
With the rise in popularity of artificial intelligence, the technology of verbal communication between man and machine has received an increasing amount of attention, but generating a good conversation remains a diffi...
详细信息
With the rise in popularity of artificial intelligence, the technology of verbal communication between man and machine has received an increasing amount of attention, but generating a good conversation remains a difficult task. The key factor in human-machine conversation is whether the machine can give good responses that are appropriate not only at the content level (relevant and grammatical) but also at the emotion level (consistent emotional expression). In our paper, we propose a new model based on long short-term memory, which is used to achieve an encoder-decoder framework, and we address the emotional factor of conversation generation by changing the model's input using a series of input transformations: a sequence without an emotional category, a sequence with an emotional category for the input sentence, and a sequence with an emotional category for the output responses. We perform a comparison between our work and related work and find that we can obtain slightly better results with respect to emotion consistency. Although in terms of content coherence our result is lower than those of related work, in the present stage of research, our method can generally generate emotional responses in order to control and improve the user's emotion. Our experiment shows that through the introduction of emotional intelligence, our model can generate responses appropriate not only in content but also in emotion.
We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image fe...
详细信息
We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image features and the other is the recurrent neural network which is used to decode the image features into captions. Our model combines the advantages of the two subnetworks by recalling visual information different from the traditional model of encoder-decoder, and then introduces multimodal layer to fuse the image and caption effectively. The proposed model has been validated on benchmark datasets (MSCOCO, Flick). It shows that the proposed model performs well on bleu-3 and bleu-4, even to some extent, beyond the best models available today (such as NIC, m-RNN, etc.).
Class imbalance is a key issue for the application of deep learning for remote sensing image classification because a model generated by imbalanced samples training has low classification accuracy for minority classes...
详细信息
Class imbalance is a key issue for the application of deep learning for remote sensing image classification because a model generated by imbalanced samples training has low classification accuracy for minority classes. In this study, an accurate classification approach using the multistage sampling method and deep neural networks was proposed to classify imbalanced data. We first balance samples by multistage sampling to obtain the training sets. Then, a state-of-the-art model is adopted by combining the advantages of atrous spatial pyramid pooling (ASPP) and encoder-decoder for pixel-wise classification, which are two different types of fully convolutional networks (FCNs) that can obtain contextual information of multiple levels in the encoder stage. The details and spatial dimensions of targets are restored using such information during the decoder stage. We employ four deep learning-based classification algorithms (basic FCN, FCN-8S, ASPP, and encoder-decoder with ASPP of our approach) on multistage training sets (original, MUS1, and MUS2) of WorldView-3 images in southeastern Qinghai-Tibet Plateau and GF-2 images in northeastern Beijing for comparison. The experiments show that, compared with existing sets (original, MUS1, and identical) and existing method (cost weighting), the MUS2 training set of multistage sampling significantly enhance the classification performance for minority classes. Our approach shows distinct advantages for imbalanced data.
Dense semantic labeling is significant in high-resolution remote sensing imagery research and it has been widely used in land-use analysis and environment protection. With the recent success of fully convolutional net...
详细信息
Dense semantic labeling is significant in high-resolution remote sensing imagery research and it has been widely used in land-use analysis and environment protection. With the recent success of fully convolutional networks (FCN), various types of network architectures have largely improved performance. Among them, atrous spatial pyramid pooling (ASPP) and encoder-decoder are two successful ones. The former structure is able to extract multi-scale contextual information and multiple effective field-of-view, while the latter structure can recover the spatial information to obtain sharper object boundaries. In this study, we propose a more efficient fully convolutional network by combining the advantages from both structures. Our model utilizes the deep residual network (ResNet) followed by ASPP as the encoder and combines two scales of high-level features with corresponding low-level features as the decoder at the upsampling stage. We further develop a multi-scale loss function to enhance the learning procedure. In the postprocessing, a novel superpixel-based dense conditional random field is employed to refine the predictions. We evaluate the proposed method on the Potsdam and Vaihingen datasets and the experimental results demonstrate that our method performs better than other machine learning or deep learning methods. Compared with the state-of-the-art DeepLab_v3+ our model gains 0.4% and 0.6% improvements in overall accuracy on these two datasets respectively.
暂无评论