A challenging issue of content-based image retrieval (CBIR) is to distinguish the target object from cluttered backgrounds, resulting in more discriminative image embeddings, compared to situations where feature extra...
详细信息
ISBN:
(纸本)9783030638191;9783030638207
A challenging issue of content-based image retrieval (CBIR) is to distinguish the target object from cluttered backgrounds, resulting in more discriminative image embeddings, compared to situations where feature extraction is distracted by irrelevant objects. To handle the issue, we propose a saliency-guided model with deep image features. The model is fully based on convolution neural networks (CNNs) and it incorporates a visual saliency detection module, making saliency detection a preceding step of feature extraction. The resulted saliency maps are utilized to refine original inputs and then compatible image features suitable for ranking are extracted from refined inputs. The model suggests a working scheme of involving saliency information into existing CNN-based CBIR systems with minimum impacts on the them. Some work assist image retrieval with other methods like object detection or semantic segmentation, but they are not so fine-grained as saliency detection, meanwhile some of them require additional annotations to train. In contrast, we train the saliency module in weak-supervised end-to-end style and do not need saliency ground truth. Extensive experiments are conducted on standard image retrieval benchmarks and our model shows competitive retrieval results.
Food image classification is considered as a one of the uplift applications of visual food object recognition in the area of food imageprocessing. Deep learning provides great outcomes in various challenging domains ...
详细信息
ISBN:
(纸本)9781665446075
Food image classification is considered as a one of the uplift applications of visual food object recognition in the area of food imageprocessing. Deep learning provides great outcomes in various challenging domains with multiple layers to constitute the inattention of data to build computational models. With this success, many studies have put forward deep-learning-based food image classification models and attained better performances collated with conventional machine learning models. We proposed a deep CNN-based food classification method for food identification with transfer learning and the fine-tuning based on the ResNet and InceptionV3 models. Comparisons of both networks are performed with sixteen and three classes of own Indian food image datasets. Inception V3 achieved more accuracy compared to ResNet-50 when more numbers of food image classes are considered.
This paper proposed a flame segmentation algorithm based on the saliency of motion and color. First, feature point detection is performed on the video image using the scale-invariant feature transform (SIFT) algorithm...
详细信息
ISBN:
(纸本)9789811365041;9789811365034
This paper proposed a flame segmentation algorithm based on the saliency of motion and color. First, feature point detection is performed on the video image using the scale-invariant feature transform (SIFT) algorithm, and the optical flow field of moving object in the adjacent frame is acquired by the optical flow method. According to the optical flow vector difference between the target pixel point and the surrounding neighborhood pixels, the motion saliency map is obtained based on the Munsell color system. Then, the LSI flame color statistical model based on the Lab and HSI space is used to extract color saliency map of video images. Finally, under the Bayes framework, the motion saliency map and the color saliency map are fused in an interactive manner to obtain the final flame segmentation map. Experimental results show that the proposed algorithm can effectively segment the flame image in different scenarios.
Skin cancer is one of the life threatening diseases and there is mostly no chance of remission from skin cancer if diagnosed in the last stage. The three major types of skin cancers are basal cell carcinoma, squamous ...
详细信息
Real-time traffic video streaming, such as roadside surveillance and aerial video, has been widely used in traffic monitoring nowadays. However, most of the traditional traffic data collection methods lack mobility th...
详细信息
ISBN:
(数字)9781728131061
ISBN:
(纸本)9781728131061
Real-time traffic video streaming, such as roadside surveillance and aerial video, has been widely used in traffic monitoring nowadays. However, most of the traditional traffic data collection methods lack mobility that can only collect macroscopic data. In this paper, an intelligent traffic monitoring system based on an open source cooperative platform called SAGE2 was developed. Based on the integrated big screen TV wall of SAGE2, a map-based aerial traffic video streaming management interface was designed. In the image pre-processing section, it provides functions such as lens distortion removal, top view projection transforms, and video stabilization;simulate video streaming to provide instant and long-term micro-flow data collection. Micro-traffic flow data provides high-resolution information both in time and space which can be used to analyze the driving behavior of individuals and the public. Combined with the lane level map, it can provide a variety of visual vehicle flow presentations, such as intersection traffic distribution that can also be used to develop an innovative application in the future.
In this paper principles of decision-making systems construction are considered. An approach to image analysis based on semantic model is proposed and studied. The results show an improvement in processing speed and i...
详细信息
For the problem of image matching in visual indoor positioning research, the image quality is degraded due to the interference and influence of noises during the generation or transmission of the image, which ultimate...
详细信息
Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human's visual perception consistency, researchers find out that convolutional ...
详细信息
ISBN:
(纸本)9783030638191;9783030638207
Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human's visual perception consistency, researchers find out that convolutional neural network's capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.
In this work, the authors have proposed a method for improving the visual quality of color images suffering from low illumination. In this pyramid based multiscale technique, the input image is decomposed into four di...
详细信息
ISBN:
(纸本)9781728152844
In this work, the authors have proposed a method for improving the visual quality of color images suffering from low illumination. In this pyramid based multiscale technique, the input image is decomposed into four different levels of resolutions. Starting from the coarsest resolution, each image is converted to HSV space, and the illumination is computed from the V component employing multiscale Gaussian function. At this stage, Weber-Fechner law is used to construct two enhanced versions of the illumination component corresponding to two different values of a parameter. These two images are fused using PCA to construct a new V image, which is subsequently super-sampled to its next higher level of resolution. The V images of the next higher level of resolution are subjected to the same treatment until we reach the base level of the pyramid. Finally, all the V component images at the base level are fused employing PCA to construct the final resultant V component, which in turn is combined with the H and S components to construct the final result. The method has been implemented and tested on a set of real 2D color images, and the results are found satisfactory. The experimental results have been compared with those of other methods based on some objective measures.
The spherical domain representation of 360 ∘ video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can b...
详细信息
暂无评论