Video object segmentation aims to segment objects in a video sequence, given some user annotation which indicates the object of interest. Although Convolutional Neural Networks (CNNs) have been used in the recent past...
ISBN:
(纸本)9781450366151
Video object segmentation aims to segment objects in a video sequence, given some user annotation which indicates the object of interest. Although Convolutional Neural Networks (CNNs) have been used in the recent past for the purpose of foreground segmentation in videos, adversarial training methods have not been used effectively to solve this problem, in spite of its extensive use for solving many other problems in computervision. Earlier, flow features and motion trajectories have been extensively used to capture the temporal consistency between subsequent frames to segment moving objects in videos. However, we show that our proposed framework of processingthe video frames independently using a deep generative adversarial network (GAN), is able to maintain the temporal coherency across frames without the use of any explicit trajectory based information, to provide superior results. Our main contribution lies in introducing a GAN based framework along withthe incorporation of an Intersection-over-Union score based novel cost function for training the model, to solve the problem of foreground object segmentation in videos. the proposed method, when evaluated on popular real-world video segmentation datasets viz. DAVIS, SegTrack-v2 and YouTube-Objects, exhibits substantial performance gain over the recent state-of-the-art methods.
Cloud computing is a highly prospective paradigm in which computational resources from third parties are used for processing outsourced data. Nonetheless, the distributed architecture of this concept poses many securi...
详细信息
ISBN:
(纸本)9781450366151
Cloud computing is a highly prospective paradigm in which computational resources from third parties are used for processing outsourced data. Nonetheless, the distributed architecture of this concept poses many security and privacy threats for the data owners. Shamir's secret sharing is an effective technique for distributing and processing secret images over the encrypted domain. However, it has got some critical limitations primarily due to the presence of correlated information between the image pixels. Our study addresses this problem by proposing a perfectly secure Shamir's secret sharing scheme for images. Our work builds upon the formal notion of perfect secrecy for encoding the Shamir's shares in a particular manner such that they (i.e. encoded shares) do not reveal any additional information about the original image. Importantly, we have provided boththeoretical and empirical validation of our proposed approach. We have also performed several image filtering operations on the stored shares and found the resulting PSNR and NCC values to be similar in the plain and encrypted domains. Hence our work provides a privacy-preserving and secure framework for working withimages over a cloud-based architecture.
Content of the document images are often shows hierarchical multi-layered tree structure. Further, the algorithms for document image applications like line detection, paragraph detection, word recognition, layout anal...
详细信息
In this article, we use among and the best-known library is Open computervision we call it for short OpenCV. It is used for imageprocessing, to do all operations we want, to isolate and detect a specific object, whi...
详细信息
ISBN:
(数字)9781728166544
ISBN:
(纸本)9781728166551
In this article, we use among and the best-known library is Open computervision we call it for short OpenCV. It is used for imageprocessing, to do all operations we want, to isolate and detect a specific object, which in our case are traffic road signs. We process to find the most efficient methods of detection of traffic road signs. Our objective is to demonstrate the links the elements for optimized and powerful to computervision algorithms that are easy to use as typing in an image and video processing. Most of the techniques they employed the color selection, edge detection, a region of interest selection, and shapes transformation of detection. Many applications require the recognition of traffic road signs in urban areas. the automation of this task is necessary, for example, ADAS systems, and the application of vision robotics or an autonomous vehicle, consist of recognizing and identifying traffic road signs as quickly as possible with errors to be minimized, in images of their type acquired by an embedded camera on board a vehicle.
Machine Learning models are known to be susceptible to small but structured changes to their inputs that can result in wrong inferences. It has been shown that such samples, called adversarial samples, can be created ...
详细信息
ISBN:
(纸本)9781450366151
Machine Learning models are known to be susceptible to small but structured changes to their inputs that can result in wrong inferences. It has been shown that such samples, called adversarial samples, can be created rather easily for standard neural network architectures. these adversarial samples pose a serious threat for deploying state-of-the-art deep neural network models in the real world. We propose a feature augmentation technique called BatchOut to learn robust models towards such examples. the proposed approach is a generic feature augmentation technique that is not specific to any adversary and handles multiple attacks. We evaluate our algorithm on benchmark datasets and architectures to show that models trained using our method are less susceptible to adversaries created using multiple methods.
8K is the pinnacle of the video systems and 8K broadcasting service will be started in December 2018. However, the availability of content for 8K TV is still insufficient, a situation similar to that of HDTV in the 19...
详细信息
ISBN:
(纸本)9783030122096;9783030122089
8K is the pinnacle of the video systems and 8K broadcasting service will be started in December 2018. However, the availability of content for 8K TV is still insufficient, a situation similar to that of HDTV in the 1990s. Upconverting analogue content to HDTV content was important to supplement the insufficient HDTV content. this upconverted content was also important for news coverage as HDTV equipment was heavy and bulky. the current situation for 8K TV is similar wherein covering news with 8K TV equipment is very difficult as this equipment is much heavier and bulkier than that required for HDTV in the 1990s. the HDTV content available currently is sufficient, and the equipment has also evolved to facilitate news coverage;therefore, an HDTV-to-8K TV upconverter can be a solution to the problems described above. However, upconversion from interlaced HDTV to 8K TV results in an enlargement of the images by a factor of 32, thus making the upconverted images very blurry. Super resolution (SR) is a technology to solve the enlargement blur issue. One of the most common SR technologies is super resolution image reconstruction (SRR). However, SRR has limitations to use for the HDTV-to-8K TV upconverter. In this paper an HDTV-to-8K TV upconverter with nonlinear processing SR has been proposed in this study in order to fix this issue.
the task of generating natural images from 3D scenes has been a long standing goal in computergraphics. On the other hand, recent developments in deep neural networks allow for trainable models that can produce natur...
详细信息
ISBN:
(纸本)9783030208769;9783030208752
the task of generating natural images from 3D scenes has been a long standing goal in computergraphics. On the other hand, recent developments in deep neural networks allow for trainable models that can produce natural-looking images with little or no knowledge about the scene structure. While the generated images often consist of realistic looking local patterns, the overall structure of the generated images is often inconsistent. In this work we propose a trainable, geometry-aware image generation method that leverages various types of scene information, including geometry and segmentation, to create realistic looking natural images that match the desired scene structure. Our geometrically-consistent image synthesis method is a deep neural network, called Geometry to image Synthesis (GIS) framework, which retains the advantages of a trainable method, e.g., differentiability and adaptiveness, but, at the same time, makes a step towards the generalizability, control and quality output of modern graphics rendering engines. We utilize the GIS framework to insert vehicles in outdoor driving scenes, as well as to generate novel views of objects from the Linemod dataset. We qualitatively show that our network is able to generalize beyond the training set to novel scene geometries, object shapes and segmentations. Furthermore, we quantitatively show that the GIS framework can be used to synthesize large amounts of training data which proves beneficial for training instance segmentation models.
this paper addresses an approach for classification of hyperspectral imagery (HSI). In remote sensing, the HSI sensor acquires hundreds of images with very narrow but continuous spectral width in visible and near-infr...
详细信息
this paper proposes an efficient method of character segmentation for handwritten text. the main challenge in character segmentation of hand-written text is the varied size of each letter in different documents, conne...
详细信息
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. the attention mechanism recombines an encoding of the image depending on the state...
详细信息
ISBN:
(纸本)9783030367183;9783030367176
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. the attention mechanism recombines an encoding of the image depending on the state of the decoder, to generate the caption sequence. the decoder is predominantly recurrent in nature. In contrast, we propose a novel network possessing attention-like properties that are pervasive through its layers, by utilizing a convolutional neural network (CNN) to refine and combine representations at multiple levels of the architecture for captioning images. We also enable the model to use explicit higher-level semantic information obtained by performing panoptic segmentation on the image. the attention capability of the model is visually demonstrated, and an experimental evaluation is shown on the MS-COCO dataset. We exhibit that the approach is more robust, efficient, and yields better performance in comparison to the state-of-the-art architectures for image captioning.
暂无评论