检索结果-内蒙古大学图书馆

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Nathan Vance Patrick Flynn University of Notre Dame

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Model architecture refinement is a challenging task in deep learning research fields such as remote photoplethysmography (rPPG). One architectural consideration, the depth of the model, can have significant consequences on the resulting performance. In rPPG models that are overprovisioned with more layers than necessary, redundancies exist, the removal of which can result in faster training and reduced computational load at inference time. With too few layers the models may exhibit sub-optimal error rates. We apply Centered Kernel Alignment (CKA) to an array of rPPG architectures of differing depths, demonstrating that shallower models do not learn the same representations as deeper models, and that after a certain depth, redundant layers are added without significantly increased functionality. An empirical study confirms how the architectural deficiencies discovered using CKA impact performance, and we show how CKA as a diagnostic can be used to refine rPPG architectures.

关键词： Training Error analysis Computational modeling Refining Redundancy computer architecture Photoplethysmography

来源：评论

学校读者我要写书评

暂无评论

vision DiffMask: Faithful Interpretation of vision Transformers with Differentiable Patch Masking

Vision DiffMask: Faithful Interpretation of Vision Transform...

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Angelos Nalmpantis Apostolos Panagiotopoulos John Gkountouras Konstantinos Papakostas Wilker Aziz University of Amsterdam

The lack of interpretability of the vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called vision DiffMask, which uses the activations of the model’s hidden layers to predict the relevant parts of the input that contribute to its final predictions. Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes. We demonstrate the faithfulness of our method, by introducing a faithfulness task, and comparing it to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K, achieving compelling results. To aid reproducibility and further extension of our work, we open source our implementation here.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Contrastive Learning for Depth Prediction

Contrastive Learning for Depth Prediction

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Rizhao Fan Matteo Poggi Stefano Mattoccia Department of Computer Science and Engineering University of Bologna Italy

Depth prediction is at the core of several computer vision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers. Unfortunately, the distribution of values on depth maps is seldom explored. Therefore, this paper proposes a novel framework combining contrastive learning and depth prediction, allowing us to pay more attention to depth distribution and consequently enabling improvements to the overall estimation process. Purposely, we propose a window-based contrastive learning module, which partitions the feature maps into non-overlapping windows and constructs contrastive loss within each one. Forming and sorting positive and negative pairs, then enlarging the gap between the two in the representation space, constraints depth distribution to fit the feature of the depth map. Experiments on KITTI and NYU datasets demonstrate the effectiveness of our framework.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Uncertainty Estimation for Tumor Prediction with Unlabeled Data

Uncertainty Estimation for Tumor Prediction with Unlabeled D...

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Juyoung Yun Shahira Abousamra Chen Li Rajarsi Gupta Tahsin Kurc Dimitris Samaras Alison Van Dyke Joel Saltz Chao Chen Department of Computer Science Stony Brook University USA Department of Biomedical Informatics Stony Brook University USA National Cancer Institute Maryland USA

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Estimating uncertainty of a neural network is crucial in providing transparency and trustworthiness. In this paper, we focus on uncertainty estimation for digital pathology prediction models. To explore the large amount of unlabeled data in digital pathology, we propose to adopt novel learning method that can fully exploit unlabeled data. The proposed method achieves superior performance compared with different baselines including the celebrated Monte-Carlo Dropout. Closeup inspection of uncertain regions reveal insight into the model and improves the trustworthiness of the models.

关键词： Pathology Uncertainty Accuracy Monte Carlo methods Neural networks Estimation Predictive models

来源：评论

学校读者我要写书评

暂无评论

(ASNA) An Attention-based Siamese-Difference Neural Network with Surrogate Ranking Loss function for Perceptual Image Quality Assessment

(ASNA) An Attention-based Siamese-Difference Neural Network ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ayyoubzadeh, Seyed Mehdi Royat, Ali McMaster Univ Hamilton ON Canada Sharif Univ Technol Tehran Iran

ISBN: (纸本)9781665448994

Recently, deep convolutional neural networks (DCNN) that leverage the adversarial training framework for image restoration and enhancement have significantly improved the processed images' sharpness. Surprisingly, although these DCNNs produced crispier images than other methods visually, they may get a lower quality score when popular measures are employed for evaluating them. Therefore it is necessary to develop a quantitative metric to reflect their performances, which is well-aligned with the perceived quality of an image. Famous quantitative metrics such as Peak signal-to-noise ratio (PSNR), The structural similarity index measure (SSIM), and Perceptual Index (PI) are not well-correlated with the mean opinion score (MOS) for an image, especially for the neural networks trained with adversarial loss functions. This paper has proposed a convolutional neural network using an extension architecture of the traditional Siamese network so-called Siamese-Difference neural network. We have equipped this architecture with the spatial and channel-wise attention mechanism to increase our method's performance. Finally, we employed an auxiliary loss function to train our model. The suggested additional cost function surrogates ranking loss to increase Spearman's rank correlation coefficient while it is differentiable concerning the neural network parameters. Our method achieved superior performance in NTIRE 2021 Perceptual Image Quality Assessment Challenge. The implementations of our proposed method are publicly available.(1 2)

关键词： Image quality Training Visualization Technological innovation PSNR Neural networks computer architecture

来源：评论

学校读者我要写书评

暂无评论

Omni-Crack30k: A Benchmark for Crack Segmentation and the Reasonable Effectiveness of Transfer Learning

Omni-Crack30k: A Benchmark for Crack Segmentation and the Re...

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Christian Benz Volker Rodehorst Bauhaus-Universität Weimar Germany

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Large benchmarking datasets, such as ImageNet, COCO, Cityscapes, or ScanNet, have enormously promoted research in computer vision. For the domain of crack segmentation, no such large and well-maintained benchmark exists. Crack segmentation is characterized by the decentralized creation of datasets, almost all of which have their specific right to existence. Each dataset covers a different aspect of the surprisingly complex landscape of materials, acquisition conditions, and appearances linked to crack segmentation. The OmniCrack30k dataset forms the first large-scale, systematic, and thorough approach to provide a sustainable basis for tracking methodical progress in the field of crack segmentation. It contains 30k samples from over 20 datasets summing up to 9 billion pixels in total. Featuring materials as diverse as asphalt, ceramic, concrete, masonry, and steel, it paves the road towards universal crack segmentation, a currently under-explored topic. Experiments indicate the effectiveness of transfer learning for crack segmentation: nnU-Net achieves a mean clIoU4px of 64% outperforming all other approaches by at least 10% points.

关键词： Training computer vision Systematics Semantic segmentation Roads Transfer learning computer architecture

来源：评论

学校读者我要写书评

暂无评论

ChaLearn Looking at People: IsoGD and ConGD Large-Scale RGB-D Gesture recognition

引用

ieee TRANSACTIONS ON CYBERNETICS 2022年第5期52卷 3422-3433页

作者： Wan, Jun Lin, Chi Wen, Longyin Li, Yunan Miao, Qiguang Escalera, Sergio Anbarjafari, Gholamreza Guyon, Isabelle Guo, Guodong Li, Stan Z. Chinese Acad Sci Inst Automat Natl Lab Pattern Recognit Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100190 Peoples R China Univ Southern Calif Dept Comp Sci Los Angeles CA 90089 USA JD Finance Mountain View CA 94043 USA Xidian Univ Sch Comp Sci & Technol Xian 710071 Peoples R China Xidian Univ Xian Key Lab Big Data & Intelligent Vis Xian 710071 Peoples R China Univ Barcelona Comp Vis Ctr Barcelona 08007 Spain Univ Tartu Inst Technol iCV Lab EE-50090 Tartu Estonia PwC Finland Helsinki 00180 Finland Hasan Kalyoncu Univ Fac Engn TR-27100 Gaziantep Turkey ChaLearn San Francisco CA 94115 USA Univ Paris Saclay F-91190 St Aubin France Baidu Res Inst Deep Learning Beijing 100193 Peoples R China Natl Engn Lab Deep Learning Technol & Applicat Beijing 100193 Peoples R China Westlake Univ Hangzhou 310024 Peoples R China Macau Univ Sci & Technol Taipa Macau Peoples R China

The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International conference on pattern recognition (ICPR) 2016 and International conference on computer vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.

关键词： Gesture recognition Measurement Task analysis Training conferences computer vision Bidirectional long short-term memory (Bi-LSTM) gesture recognition RGB-D

来源：评论

学校读者我要写书评

暂无评论

Lifting Multi-View Detection and Tracking to the Bird’s Eye View

Lifting Multi-View Detection and Tracking to the Bird’s Eye...

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Torben Teepe Philipp Wolters Johannes Gilg Fabian Herzog Gerhard Rigoll Technical University of Munich

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird’s Eye View (BEV). In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance-and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://***/tteepe/TrackTacular.

关键词： computer vision Pedestrians Three-dimensional displays Tracking conferences computer architecture Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Expectation-Maximization Attention Cross Residual Network for Single Image Super-resolution

Expectation-Maximization Attention Cross Residual Network fo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Du, Xiaobiao Niu, Jie Liu, Chongjin Jilin Univ Zhuhai Coll Zhuhai Peoples R China Unit 61212 Peoples Liberat Army Beijing Peoples R China

ISBN: (纸本)9781665448994

The depth of deep convolution neural network and self-attention mechanism is widely used for the single image super-resolution (SISR) task. Nevertheless, we observed that the deeper network was more hard to train and the self-attention mechanism is computationally consuming. Residual learning has been widely recognized as a common approach to improve network performance for deep learning, but most existing methods did not make the best of the learning ability of deep CNN, thus hindering the ability of representative CNN. In order to tackle these problems, we introduce a deep learning network namely expectation-maximization attention cross residual network (EACRN) to tackle the super-resolution task. Particularly, we propose a cross residual in cross residual (CRICR) structure that makes up very deep networks consisting of multiple cross residual groups (CRG) with global residual skip connections. Every cross residual group (CRG) consists of some cross residual blocks with cross short skip connections. At the same time, CRICR allows network focused on capturing high frequency patterns by connecting rich low frequency patterns to be bypassed and several short skip connections. In addition, we introduce various convolution kernel size so that adaptive capture the image pattern in different scales, which make these features get the more efficacious image information through interacting with each other. The introduced Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Extensive experiments demonstrate our EACRN obtains superior performance and visual effect relative to the most advanced algorithm.

关键词： Deep learning Convolution Computational modeling Superresolution Neural networks Visual effects pattern recognition

来源：评论

学校读者我要写书评

暂无评论

BMAD: Benchmarks for Medical Anomaly Detection

BMAD: Benchmarks for Medical Anomaly Detection

引用

ieee computer society conference on computer vision and pattern recognition workshops (CVPRW)

作者： Jinan Bao Hanshi Sun Hanqiu Deng Yinsheng He Zhaoxiang Zhang Xingyu Li Electrical and Computer Engineering University of Alberta Electrical and Computer Engineering Carnegie Mellon University

ISBN: (数字)9798350365474

ISBN: (纸本)9798350365481

Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In the field of medical imaging, AD plays a crucial role in identifying anomalies that may indicate rare diseases or conditions. However, despite its importance, there is currently a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To address this gap, we present a comprehensive evaluation benchmark for assessing AD methods on medical images. This benchmark consists of six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fifteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables researchers to easily compare and evaluate different AD methods, and ultimately leads to the development of more effective and robust AD algorithms for medical imaging. More information on BMAD is available in our GitHub repository: https://***/DorisBao/BMAD 1

关键词： computer vision Benchmark testing Video surveillance Retina pattern recognition Medical diagnosis Medical diagnostic imaging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：