检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Cao, Cong Yue, Huanjing Liu, Xin Yang, Jingyu School of Electrical and Information Engineering Tianjin University Tianjin China Computer Vision and Pattern Recognition Laboratory School of Engineering Science Lappeenranta-Lahti University of Technology LUT Lappeenranta Finland

Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various tasks of image restoration and enhancement. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model. By replacing the spatial self-attention layer with the proposed short-long-range (SLR) temporal attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy to improve temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method. Copyright © 2024, The Authors. All rights reserved.

关键词： Restoration

来源：评论

学校读者我要写书评

暂无评论

Learning to avoid poor images: Towards task-aware C-arm cone-beam CT trajectories

arXiv

引用

arXiv 2019年

作者： Zaech, Jan-Nico Gao, Cong Bier, Bastian Taylor, Russell Maier, Andreas Navab, Nassir Unberath, Mathias Laboratory for Computational Sensing and Robotics Johns Hopkins University Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Computer Vision Laboratory Eidgenössische Technische Hochschule Zürich

Metal artifacts in computed tomography (CT) arise from a mismatch between physics of image formation and idealized assumptions during tomographic reconstruction. These artifacts are particularly strong around metal implants, inhibiting widespread adoption of 3D cone-beam CT (CBCT) despite clear opportunity for intra-operative verification of implant positioning, e. g. in spinal fusion surgery. On synthetic and real data, we demonstrate that much of the artifact can be avoided by acquiring better data for reconstruction in a task-aware and patient-specific manner, and describe the first step towards the envisioned task-aware CBCT protocol. The traditional short-scan CBCT trajectory is planar, with little room for scene-specific adjustment. We extend this trajectory by autonomously adjusting out-of-plane angulation. This enables C-arm source trajectories that are scene-specific in that they avoid acquiring"poor images", characterized by beam hardening, photon starvation, and noise. The recommendation of ideal out-of-plane angulation is performed on-the-fly using a deep convolutional neural network that regresses a detectability-rank derived from imaging physics. Copyright © 2019, The Authors. All rights reserved.

关键词： Trajectories

来源：评论

学校读者我要写书评

暂无评论

Motion constraint patterns

Motion constraint patterns

引用

IEEE Workshop on Qualitative vision

作者： C. Fermuller Department for Pattern Recognition and Image Processing Institute for Automation Technical University of of Vienna Vienna Austria Computer Vision Laboratory Center for Automation Research University of Maryland College Park MD USA

The problem of egomotion recovery has been treated by using as input local image motion, with the published algorithms utilizing the geometric constraint relating 2-D local image motion (optical flow, correspondence, derivatives of the image flow) to 3-D motion and structure. Since it has proved very difficult to achieve accurate input (local image motion), a lot of effort has been devoted to the development of robust techniques. A new approach to the problem of egomotion estimation is taken, based on constraints of a global nature. It is proved that local normal flow measurements form global patterns in the image plane. The position of these patterns is related to the three dimensional motion parameters. By locating some of these patterns, which depend only on subsets of the motion parameters, through a simple search technique, the 3-D motion parameters can be found. The proposed algorithmic procedure is very robust, since it is not affected by small perturbations in the normal flow measurements. As a matter of fact, since only the sign of the normal flow measurement is employed, the direction of translation and the axis of rotation can be estimated with up to 100% error in the image measurements.< >

关键词： Motion estimation computer vision Automation Image motion analysis Fluid flow measurement Motion measurement Rotation measurement Laboratories Educational institutions Geometrical optics

来源：评论

学校读者我要写书评

暂无评论

Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

Box-driven Class-wise Region Masking and Filling Rate Guided...

引用

IEEE/CVF Conference on computer vision and pattern recognition

作者： Chunfeng Song Yan Huang Wanli Ouyang Liang Wang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition (NLPR) Institute of Automation Chinese Academy of Sciences (CASIA) The University of Sydney SenseTime Computer Vision Research Group

ISBN: (纸本)9781728132945

Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.

关键词： bounding boxes filling rate Masking Semantics Supervision Regulated industries

来源：评论

学校读者我要写书评

暂无评论

Towards Accurate Scene Text recognition With Semantic Reasoning Networks

Towards Accurate Scene Text Recognition With Semantic Reason...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Deli Yu Xuan Li Chengquan Zhang Tao Liu Junyu Han Jingtuo Liu Errui Ding School of Artificial Intelligence University of Chinese Academy of Sciences National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Department of Computer Vision Technology(VIS) Baidu Inc.

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use.

关键词： Semantics Visualization Text recognition Cognition Feature extraction Decoding Robustness

来源：评论

学校读者我要写书评

暂无评论

Learning to predict context-adaptive convolution for semantic segmentation

arXiv

引用

arXiv 2020年

作者： Liu, Jianbo He, Junjun Ren, Jimmy S. Qiao, Yu Li, Hongsheng CUHK-SenseTime Joint Laboratory Chinese University of Hong Kong Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences SenseTime Research

Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods [34] demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K. Copyright © 2020, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Finding a Semantic Structure Interactively in Image Databases

Finding a Semantic Structure Interactively in Image Database...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Manjeet Rege Ming Dong Farshad Fotouhi Machine Vision & Pattern Recognition Laboratory Department of Computer Science Wayne State University Detroit MI USA Database & Multimedia Systems Group Department of Computer Science Wayne State University Detroit MI USA

We present a new approach to organize an image database by finding a semantic structure interactively based on multi-user relevance feedback. By treating user relevance feedbacks as weak classifiers and combining them together, we are able to capture the categories in the users' mind and build a semantic structure in the image database. Experiments performed on an image database consisting of general purpose images demonstrate that our system outperforms some of the other conventional methods

关键词： Image databases Feedback Image retrieval Machine learning algorithms Support vector machines Support vector machine classification Information retrieval Machine vision pattern recognition Multimedia databases

来源：评论

学校读者我要写书评

暂无评论

Co-Clustering Image Features and Semantic Concepts

Co-Clustering Image Features and Semantic Concepts

引用

IEEE International Conference on Image Processing

作者： Manjeet Rege Ming Dong Farshad Fotouhi Department of Computer Science Machine Vision & Pattern Recognition Laboratory Wayne State University Detroit MI USA Database & Multimedia Systems Group Wayne State University Detroit MI USA

In this paper, we present a novel idea of co-clustering image features and semantic concepts. We accomplish this by modelling user feedback logs and low-level features using a bipartite graph. Our experiments demonstrate that (1) incorporating semantic information achieves better image clustering and (2) feature selection in co-clustering narrows the semantic gap, thus enabling efficient image retrieval.

关键词： Feedback Image retrieval Bipartite graph Image databases Clustering algorithms Machine vision pattern recognition Multimedia databases Spatial databases Multimedia systems

来源：评论

学校读者我要写书评

暂无评论

Learning dynamical human-joint affinity for 3D pose estimation in videos

arXiv

引用

arXiv 2021年

作者： Zhang, Junhao Wang, Yali Zhou, Zhipeng Luan, Tianyu Wang, Zhe Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of California Irvine United States Shanghai AI Laboratory Shanghai China

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level vision

DegAE: A New Pretraining Paradigm for Low-Level Vision

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elaborately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：