检索结果-内蒙古大学图书馆

8th International Conference on Computing and Artificial Intelligence, ICCAI 2022

作者： Gong, Lei Wang, Da-Han Wu, Yun Ye, Hai-Li Zhu, Chen-Yan School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen361024 China Medical Diagnostic Systems Co. Ltd. Xiamen361000 China

ISBN: (纸本)9781450396110

At present, the most advanced semantic segmentation model training mainly relies on pixel-level annotation, that is, annotating the category of each pixel of an image. Such annotation usually is time-consuming and expensive, especially for special applications that require expert annotation. The weakly-supervised segmentation method using the point-level supervision information has been investigated which however has great problems that the supervision information is quite limited and the performance is far from fully supervised methods. In this paper, we proposes an novelty interactive image segmentation method based on weak supervision, which allows multiple feedbacks of easily obtained weakly supervised information and improves the efficiency of utility of the supervision information. In the downstream task (interactive image segmentation), supervised information at the point level is used for many times, which makes the connection between pixels in the upstream task become closer and improves the segmentation accuracy. First, image-level tags are used to train the classification network. Then the pseudo-semantic labels are generated and put into the interactive segmentation network for training, and an almost completely supervised CNN is obtained, which further improves the performance and provides operability for human-computer interaction. The proposed method achieves promising semantic segmentation results that are close to those obtained by strongly supervised segmentation methods on the PASCAL VOC 2012 datasets. © 2022 ACM.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Warp that smile on your face: Optimal and smooth deformations for face recognition

Warp that smile on your face: Optimal and smooth deformation...

引用

International Conference on Automatic Face and Gesture recognition

作者： Tobias Gass Leonid Pishchulin Philippe Dreuw Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany Computer Vision Laboratory ETH Zurich Switzerland Computer Vision and Multimodal Computing MPI Informatics Saarbruecken Germany

In this work, we present novel warping algorithms for full 2D pixel-grid deformations for face recognition. Due to high variation in face appearance, face recognition is considered a very difficult task, especially if only a single reference image, for example a mug-shot, per face is available. Usually model-based approaches with additional training data are used to cope with several types of variation occurring in facial imaging. Image warping contrarily yields a distance measure which is invariant with regard to several types of variation. This allows for precise recognition even using only very few reference observations. Due to the computationally complex problem of optimal 2D warping, pseudo-2D warping-based approaches in the past represented strong approximations of the original problem, and were mainly successful on data with low variability or rectified images. We propose a novel 2D warping method which is globally optimal and makes no prior assumtions on the data variability besides two-dimensional smootheness constraints which both avoid local mirroring and gaps and significantly speed up the optimization. Furthermore, we show that occlusion handling is imperative to obtain smooth warpings in a variety of domains. We evaluate our novel algorithm on various well known databases, such as the AR-Face and CMU-PIE database, and provide a detailed comparison to existing warping approaches. We show that by using simple relative 2D constraints, strong local features and a kernel, which is robust w.r.t. occlusions, our computationally complex approaches outperform state-of-the-art results for recognizing faces under varying expressions, occlusions and poses. Most interestingly, we achieve higher accuracy using fewer training instances per class compared to methods learning a model of the 3D shape.

关键词： Pixel Face Face recognition Optimization Databases Approximation methods Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Trimap generation with background for natural image matting 3

Trimap generation with background for natural image matting

引用

3rd International Conference on Optics and Machine vision, ICOMV 2024

作者： Fu, Qian Liang, Yihui Kun, Zou Feng, Fujian Xu, Xiang School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan China Guizou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang China

ISBN: (纸本)9781510680319

Image matting is a widely-used image processing technique that aims at accurately separating foreground from an image. However, this is a challenging and ill-posed problem that demands additional input, such as trimaps and background images, for providing prior knowledge. However, the manual annotation of trimaps require lots of labor, limiting the application of trimap-based methods. Some trimap-free methods explore alternatives with low labor requirements by utilizing captured background images, including background-based methods. However, the quality of alpha mattes predicted by trimap-free methods still fall short of trimap-based methods. To reduce the performance gap between background-based and trimap-based methodes, we present Trimap Generation from Background Image (TG-BG) method which can generate trimaps from the input image and a captured background image. It provides an economical solution to facilitate the application of trimap-based methods, allowing for low-cost and high-quality alpha matte predictions. TP-BG leverages a ViT backbone for feature extraction and employs the Image and Background Detail Fusion Stream (IBDFS) to capture multi-scale detail information. The introduction of foreground impact loss encourages the network to pay more attention to the foreground in the image. We validate the trimap prediction performance of TP-BG by comparing the alpha matte quality obtained by background-based methods and that obtained by trimap-based methods integrated with TP-BG. The experimental results demonstrate that TP-BG can generate high-quality trimap from a background image, and trimap-based methods integrated with TP-BG outperform the state-of-the-art background-based methods in terms of four alpha matte quality metrics. © 2024 SPIE.

关键词： Costs

来源：评论

学校读者我要写书评

暂无评论

An image-sequence compressing algorithm based on homography transformation for unmanned aerial vehicle

An image-sequence compressing algorithm based on homography ...

引用

International Symposium on Intelligence Information Processing and Trusted Computing

作者： Gong, Junbin Zheng, Chenlin Tian, Jinwen Wu, Dingxue Institute for Pattern Recognition and Artificial Intelligence National Key Laboratory of Science and Technology on Multi-spectral Information Processing Huazhong University of Science and Technology Wuhan 430074 China College of Computer Science and Technology Huanggang Normal University Huanggang 438000 China

ISBN: (纸本)9780769541969

Focus on the image compressing problem of unmanned aerial vehicle with high compression ratio, fixed compressing ratio and low computational complexity requirement, a low-complexity image-sequence compressing algorithm based on homography transformation was proposed. The image sequences were dynamically divided into framegroups according the data from airborne inertial navigation systems, and the intermediate frames in the same frame-group was bi-directionally predicted by the first-frame and the end-frame with homography transformation. The homography matrix was got approximately by the airborne inertial navigation systems firstly and then was accurately computed by fast multiple sub-areas template matching. At the end the first frame and the residual images of the intermediate frames of the same frame-group was merged into a big image and coded by JPEG2000 to generate fixed-size code streams. The experiment results show that the proposed algorithm was with high compression performance, low computational complexity and excellent capacity for code-size control and will has good prospect in engineer. © 2010 IEEE.

关键词： Image compression

来源：评论

学校读者我要写书评

暂无评论

Efficient Image Super-Resolution Using Pixel Attention 16th

Efficient Image Super-Resolution Using Pixel Attention

引用

Workshops held at the 16th European Conference on computer vision, ECCV 2020

作者： Zhao, Hengyuan Kong, Xiangtao He, Jingwen Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783030670696

This work aims at designing a lightweight convolutional neural network for image super resolution (SR). With simplicity bare in mind, we construct a pretty concise and effective network with a newly proposed pixel attention scheme. Pixel attention (PA) is similar as channel attention and spatial attention in formulation. The difference is that PA produces 3D attention maps instead of a 1D attention vector or a 2D map. This attention scheme introduces fewer additional parameters but generates better SR results. On the basis of PA, we propose two building blocks for the main branch and the reconstruction branch, respectively. The first one—SC-PA block has the same structure as the Self-Calibrated convolution but with our PA layer. This block is much more efficient than conventional residual/dense blocks, for its two-branch architecture and attention scheme. While the second one—U-PA block combines the nearest-neighbor upsampling, convolution and PA layers. It improves the final reconstruction quality with little parameter cost. Our final model—PAN could achieve similar performance as the lightweight networks—SRResNet and CARN, but with only 272K parameters (17.92% of SRResNet and 17.09% of CARN). The effectiveness of each proposed component is also validated by ablation study. The code is available at https://***/zhaohengyuan1/PAN. © 2020, Springer Nature Switzerland AG.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Conditional Sequential Modulation for Efficient Global Image Retouching 16th

Conditional Sequential Modulation for Efficient Global Image...

引用

16th European Conference on computer vision, ECCV 2020

作者： He, Jingwen Liu, Yihao Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT - SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783030586003

Photo retouching aims at enhancing the aesthetic visual quality of images that suffer from photographic defects such as over/under exposure, poor contrast, inharmonious saturation. Practically, photo retouching can be accomplished by a series of image processing operations. In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs). Based on this analysis, we propose an extremely light-weight framework - Conditional Sequential Retouching Network (CSRNet) - for efficient global image retouching. CSRNet consists of a base network and a condition network. The base network acts like an MLP that processes each pixel independently and the condition network extracts the global features of the input image to generate a condition vector. To realize retouching operations, we modulate the intermediate features using Global Feature Modulation (GFM), of which the parameters are transformed by condition vector. Benefiting from the utilization of 1 × 1 convolution, CSRNet only contains less than 37 k trainable parameters, which is orders of magnitude smaller than existing learning-based methods. Extensive experiments show that our method achieves state-of-the-art performance on the benchmark MIT-Adobe FiveK dataset quantitively and qualitatively. Code is available at https://***/hejingwenhejingwen/CSRNet. © 2020, Springer Nature Switzerland AG.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

Adaptive Pyramid Context Network for Semantic Segmentation

Adaptive Pyramid Context Network for Semantic Segmentation

引用

IEEE/CVF Conference on computer vision and pattern recognition

作者： Junjun He Zhongying Deng Lei Zhou Yali Wang Yu Qiao Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences

ISBN: (纸本)9781728132945

Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.

关键词： Semantics subregion image representation Pascal TEST SETS

来源：评论

学校读者我要写书评

暂无评论

Face-sketch learning with human sketch-drawing order enforcement

引用

Science China(Information Sciences) 2020年第11期63卷 298-311页

作者： Liang CHANG Lihua JIN Lifen WENG Wentao CHAO Xuguang WANG Xiaoming DENG Qiulei DONG School of Artificial Intelligence Beijing Normal University Department of Design Art Xiamen University of Technology Department of Automation North China Electric Power University Beijing Key Laboratory of Human Computer Interactions Institute of Software Chinese Academy of Sciences National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences School of Artificial Intelligence University of Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology Chinese Academy of Sciences

Dear editor,Although face-sketch synthesis generates a sketch from a given face photo automatically [1], it is an open research problem in computer vision [2–4]. Recently, several deep neural network (DNN)methods for... 详细信息

关键词： face sketch synthesis deep neural network order enforcement image synthesis generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images 2nd

A New DCT-FFT Fusion Based Method for Caption and Scene Text...

引用

2nd International Conference on pattern recognition and Artificial Intelligence, ICPRAI 2020

作者： Nandanwar, Lokesh Shivakumara, Palaiahnakote Manna, Suvojit Pal, Umapada Lu, Tong Blumenstein, Michael Faculty of Computer Science and Information Technology University of Malayasia Kuala Lumpur Malaysia Department of Computer Science and Engineering Jalpaiguri Government Engineering College Jalpaiguri India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China University of Technology Sydney Ultimo Australia

ISBN: (纸本)9783030598297

Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification. © 2020, Springer Nature Switzerland AG.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Robust stereo on multiple resolutions

Robust stereo on multiple resolutions

引用

International Conference on pattern recognition

作者： C. Menard A. Leonardis Department for Pattern Recognition and Image Processing Technical University of of Vienna Vienna Austria Computer and InformationScience. Computer Vision Laboratory University of Ljubljana Ljubljana Slovenia

Stereo computation is one of the vision problems where the presence of outliers cannot be neglected. Most standard algorithms make unrealistic assumptions about noise distributions, which leads to erroneous results that cannot be corrected in subsequent postprocessing stages. In this paper we present a modification of the standard area-based correlation approach so that it can tolerate a significant number of outliers. The approach exhibits a robust behavior not only in the presence of mismatches but also in the case of depth discontinuities. The confidence measure of the correlation and the number of outliers provide two complementary sources of information which, when implemented in a multiresolution framework, result in a robust and efficient method. We present the results of this approach on a number of synthetic and real images.

关键词： Stereo vision computer vision Noise robustness Information resources Correlation pattern recognition Image processing Layout Cameras Laboratories

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：