Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific ra...
详细信息
ISBN:
(纸本)9781728180687
Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific rain type. However, their generalization capability is highly limited as well. In this paper, we propose a unified de-raining model by selectively fusing the clean background of the input rain image and the well restored regions occluded by various rains. This is achieved by our region adaptive coupled network (RACN), whose two branches integrate the features of each other in different layers to jointly generate the spatial-variant weight and restored image respectively. On the one hand, the weight branch could lead the restoration branch to focus on the regions with higher contributions for de-raining. On the other hand, the restoration branch could guide the weight branch to keep off the regions with over-/under-filtering risks. Extensive experiments show that our method outperforms many state-of-the-art de-raining algorithms on diverse rain types including the rain streak, raindrop and rain-mist.
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food safety detection. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communicati...
详细信息
ISBN:
(纸本)9781728185514
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. I...
详细信息
ISBN:
(纸本)9781665475921
Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. In the recent decade, most researchers have been focusing on the design of novel deep learning models for improved crowd counting performance. Such attempts include proposing advanced architectures of deep neural networks, using different training strategies and loss functions. Other than the capabilities of models, the crowd counting performance is also determined by the quantity and the quality of training data. Whilst the deep models are data-hungry and better performance can usually be expected with more training data, annotating images for training is time-consuming and expensive in real-world applications. In this work, we focus on the efficiency of data annotation for crowd counting. By varying the number of annotated images and the number of annotated points (one point is annotated per person head) for training, our experimental results demonstrate it is more efficient to annotate a small number of points per image across a large number of images for training. Based on this conclusion, we present a novel adaptive scaling mechanism for data augmentation to diversify the training images without extra annotation cost. The mechanism is proved effective via thorough experiments.
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large ...
详细信息
ISBN:
(纸本)9781665475921
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large number of parameters and computation. To tackle the problem above, we propose a novel framework to extract features more efficiently. Inspired by the idea of deep separable convolution, we improve the standard residual block and propose the inverted bottleneck block (IBNB). The IBNB replaces the small-sized convolution kernel with the large-sized convolution kernel without introducing additional computation. The proposed IBNB proves that large kernel size convolution is available for SISR. Comprehensive experiments demonstrate that our method surpasses most methods by up to 0.10 similar to 0.32dB in quantitative metrics with fewer parameters.
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports ...
详细信息
ISBN:
(纸本)9781728180687
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports stealthy, thereby, palm off the spurious for the genuine. This paper proposes an architecture named News image Steganography (NIS) to reveal the aforementioned inconsistency through image steganography based on GAN. Extractive summarization about a news image is generated based on its source texts, and a learned steganographic algorithm encodes and decodes the summarization of the image in a manner that approaches perceptual invisibility. Once an encoded image is quoted, its source summarization can be decoded and further presented as the ground truth to verify the quoting news. The pairwise encoder and decoder endow images of the capability to carry along their imperceptible summarization. Our NIS reveals the underlying inconsistency, thereby, according to our experiments and investigations, contributes to the identification accuracy of fake news that engrafts untampered images.
The demand for human face enhancement in pictures is increasing. This paper describes an effort to utilize state-of-the-art signal processing technologies for the enhancement of the human face in pictures. First, seve...
详细信息
ISBN:
(纸本)9781479961399
The demand for human face enhancement in pictures is increasing. This paper describes an effort to utilize state-of-the-art signal processing technologies for the enhancement of the human face in pictures. First, several non-linear filters are examined, and it is demonstrated that the total variation regularization filter (TV filter) shows the remarkably best effect for skin smoothing including the removal of wrinkles, stains, moles, and freckles. The reason is analyzed in detail. Then, super-resolution technology is utilized to enhance the image quality for specific parts of the face, such as the eye line, pupil, eyelashes, and hair. Facial part extraction technology is also utilized for the enhancement of selected face parts. Interestingly, we found that the super-resolution technology not only improves the clarity of the image but also increases the brilliancy in the pupil and hair. The super-resolution technology used in this paper is based on the non-linear filtering method developed for 4K high-definition television.
Compressed image quality assessment (IQA) has been a crucial part of a wide range of image services such as storage and transmission. Due to the effect of different bit rates and compression methods, the compressed im...
详细信息
ISBN:
(纸本)9781728185514
Compressed image quality assessment (IQA) has been a crucial part of a wide range of image services such as storage and transmission. Due to the effect of different bit rates and compression methods, the compressed images usually have different levels of quality. Nowadays, the mainstream full-reference (FR) metrics are effective to predict the quality of compressed images at coarse-grained levels, however, they may perform poorly when quality differences of the compressed images are quite subtle. To better improve the Quality of Experience (QoE) and provide useful guidance for compression algorithms, we propose an FR-IQA metric for fine-grained compressed images, which estimates the image quality by analyzing the difference of structure and texture. Our metric is mainly validated on the fine-grained compression IQA (FGIQA) database and is tested on other commonly used compression IQA databases as well. The experimental results show that our metric outperforms mainstream FR-IQA metrics on the fine-grained compression IQA database and also obtains competitive performance on the coarse-grained compression IQA databases.
The risk of solitary death is rising because there is an increasing number of elderly living alone in Japan. Therefore, attempts are made to watch elderly remotely from his/her family. However, these systems have prob...
详细信息
ISBN:
(纸本)9781538644584
The risk of solitary death is rising because there is an increasing number of elderly living alone in Japan. Therefore, attempts are made to watch elderly remotely from his/her family. However, these systems have problems such as difficulty in confirming the status of the elderly in real time and privacy issues. In this paper, we propose a method to detect abnormal condition using infrared array sensor.
暂无评论