Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various tasks of image restoration and enhancement. However, directly applying them to video restoration and enhancemen...
详细信息
Metal artifacts in computed tomography (CT) arise from a mismatch between physics of image formation and idealized assumptions during tomographic reconstruction. These artifacts are particularly strong around metal im...
详细信息
The problem of egomotion recovery has been treated by using as input local image motion, with the published algorithms utilizing the geometric constraint relating 2-D local image motion (optical flow, correspondence, ...
详细信息
The problem of egomotion recovery has been treated by using as input local image motion, with the published algorithms utilizing the geometric constraint relating 2-D local image motion (optical flow, correspondence, derivatives of the image flow) to 3-D motion and structure. Since it has proved very difficult to achieve accurate input (local image motion), a lot of effort has been devoted to the development of robust techniques. A new approach to the problem of egomotion estimation is taken, based on constraints of a global nature. It is proved that local normal flow measurements form global patterns in the image plane. The position of these patterns is related to the three dimensional motion parameters. By locating some of these patterns, which depend only on subsets of the motion parameters, through a simple search technique, the 3-D motion parameters can be found. The proposed algorithmic procedure is very robust, since it is not affected by small perturbations in the normal flow measurements. As a matter of fact, since only the sign of the normal flow measurement is employed, the direction of translation and the axis of rotation can be estimated with up to 100% error in the image measurements.< >
Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are exp...
详细信息
ISBN:
(纸本)9781728132945
Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.
Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use.
Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods [34] demonstrate that using global context for re-weighting feature channels c...
详细信息
We present a new approach to organize an image database by finding a semantic structure interactively based on multi-user relevance feedback. By treating user relevance feedbacks as weak classifiers and combining them...
详细信息
We present a new approach to organize an image database by finding a semantic structure interactively based on multi-user relevance feedback. By treating user relevance feedbacks as weak classifiers and combining them together, we are able to capture the categories in the users' mind and build a semantic structure in the image database. Experiments performed on an image database consisting of general purpose images demonstrate that our system outperforms some of the other conventional methods
In this paper, we present a novel idea of co-clustering image features and semantic concepts. We accomplish this by modelling user feedback logs and low-level features using a bipartite graph. Our experiments demonstr...
详细信息
In this paper, we present a novel idea of co-clustering image features and semantic concepts. We accomplish this by modelling user feedback logs and low-level features using a bipartite graph. Our experiments demonstrate that (1) incorporating semantic information achieves better image clustering and (2) feature selection in co-clustering narrows the semantic gap, thus enabling efficient image retrieval.
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation ...
详细信息
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? Wha...
Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.
暂无评论