image segmentation is a difficult and challenging task because of the complex object appearance and diverse object categories. Traditional methods directly use visual features for segmentation but ignore the correlati...
详细信息
ISBN:
(纸本)9781728198354
image segmentation is a difficult and challenging task because of the complex object appearance and diverse object categories. Traditional methods directly use visual features for segmentation but ignore the correlation between objects. We introduce a knowledge reasoning module (KRM) for external knowledge aggregation and leverage a graphic neural network to aggregate the knowledge feature, which is concatenated with a visual feature for semantic segmentation. To this end, we use word embedding of category names as semantic feature and establish the relationship between categories. Through iteration, the aggregated features can be enriched. In experiments, three well known semantic segmentation methods are used as baseline. Our experiment results outperform the baseline methods on the food dataset Food-Seg103 and Cityscapes, and demonstrate the effectiveness of our proposed method.
Mural paintings are the treasures of Chinese culture and contain high values. Acoustic emission technology, combined with digital signalprocessing and convolutional neural network methods, can non-destructively and i...
详细信息
image and video coding for machines has been recently gaining more and more interest from both the industry and the research community. One successful approach is based on end-to-end (E2E) learned compression and has ...
详细信息
ISBN:
(纸本)9798350338935
image and video coding for machines has been recently gaining more and more interest from both the industry and the research community. One successful approach is based on end-to-end (E2E) learned compression and has shown significant gains over the state-of-the-art conventional image coding methods. However, one of the remaining challenges for such E2E-learned image codecs for machines is to adaptively allocate the bits over different regions of the image, while retaining the machine vision performance. In this paper, we propose a method that leverages Regions-Of-Interest (ROIs) for bitrate allocation within a Learned image Codec (LIC) for machines. In particular, the proposed method reduces the bits allocated for the background regions of the image by reducing the variance of the elements corresponding to the background regions in the latent representation. This results in more heavily quantized background areas, while keeping the quality of the ROI areas suitable for machine tasks. The proposed method achieves significant gains, -15.80% and -22.43% Pareto BD-rate reduction, over the baseline LIC on object detection and instance segmentation tasks, respectively. To the best of our knowledge, this is the first research paper proposing an ROI-based inference-time technology for Learned image Coding for machines.
Accessibility to big training datasets together with current advances in computing power has emerged interest in the leverage of deep learning to address image compression. This needs to train and deploy separate netw...
详细信息
Accessibility to big training datasets together with current advances in computing power has emerged interest in the leverage of deep learning to address image compression. This needs to train and deploy separate networks for rate adaptation, which is impractical and extensive in terms of memory cost and power consumption, especially for broad bitrate ranges. To deal with such limitation, the variable-rate compression methods use the Lagrange multiplier to control the Rate/Distortion trade-offs in order not to require retraining of the neural network for each rate. However, they do not make an optimized bit allocation for the eye-catching foreground details, and do not consider the different degree of attention that the human eye has to each area of the image. Thus, other deep learning-based image compression approaches, which could outperform the above ones, are replied on the use of additional information. In this paper, we present a loss-conditional autoencoder tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy variable-rate compression. Our framework is a neural network-based scheme able to automatically optimize coding parameters with multi-term perceptual loss function based on semantic-important structural SIMilarity index. To ensure the rate adaptation, we suggest modulating the compression network on the bitwidth of its activations by quantizing them according to several bitwidth values. Experiments are presented on the JPEG AI dataset in which our method achieves competitive and higher visual quality for the same compressed size, when compared to conventional codecs and related work.
stochastic gradient optimization methods are broadly used to minimize non-convex smooth objective functions, for instance when training deep neural networks. However, theoretical guarantees on the asymptotic behaviour...
详细信息
image segmentation is a critical step in digital imageprocessing applications. One of the most preferred methods for image segmentation is multilevel thresholding, in which a set of threshold values is determined to ...
详细信息
image segmentation is a critical step in digital imageprocessing applications. One of the most preferred methods for image segmentation is multilevel thresholding, in which a set of threshold values is determined to divide an image into different classes. However, the computational complexity increases when the required thresholds are high. Therefore, this paper introduces a modified Coronavirus Optimization algorithm for image segmentation. In the proposed algorithm, the chaotic map concept is added to the initialization step of the naive algorithm to increase the diversity of solutions. A hybrid of the two commonly used methods, Otsu's and Kapur's entropy, is applied to form a new fitness function to determine the optimum threshold values. The proposed algorithm is evaluated using two different datasets, including six benchmarks and six satellite images. Various evaluation metrics are used to measure the quality of the segmented images using the proposed algorithm, such as mean square error, peak signal-to-noise ratio, Structural Similarity Index, Feature Similarity Index, and Normalized Correlation Coefficient. Additionally, the best fitness values are calculated to demonstrate the proposed method's ability to find the optimum solution. The obtained results are compared to eleven powerful and recent metaheuristics and prove the superiority of the proposed algorithm in the image segmentation problem.
Recently, relying solely on T2I has gradually proven insufficient to meet the demands for image generation. As a result, people have started exploring more controllable image-generation methods based on Diffusion tech...
详细信息
Convolutional neural Networks (CNNs) have gained significant popularity in image classification tasks, yet achieving their optimal design remains a challenge due to the vast array of possible layer configurations and ...
详细信息
This paper proposes a two-stage 3D object detection framework, multiscale voxel graph neural network (MSV-RGNN) which aims to fully exploit multiple scale graph features by establishing global and local relationships ...
详细信息
ISBN:
(纸本)9781728198354
This paper proposes a two-stage 3D object detection framework, multiscale voxel graph neural network (MSV-RGNN) which aims to fully exploit multiple scale graph features by establishing global and local relationships between voxel features at different 3D convolutional neural network (CNN) layers. In contrast to conventional graph-based methods, our proposed multiscale-voxel-graph region-of-interest (RoI) pooling module constructs graphs across diverse voxel resolutions to obtain geometric structure information on voxel features. Initially, our multiscale-voxel-graph RoI pooling module sample voxel center points with voxel-wise feature vectors and 3D region proposals from backbone network. Subsequently, graphs are constructed at different scales and graph features are aggregated for second-stage refinement. The experimental results demonstrate the potential of using multiscale graphs across different voxel resolutions for 3D object detection, achieving decent experimental results with state-of-the-art methods.
In the evolving digital landscape, the proliferation of manipulated images poses a significant challenge to the authenticity and integrity of visual content. This project investigates cutting-edge image manipulation d...
详细信息
暂无评论