Model architecture refinement is a challenging task in deep learning research fields such as remote photoplethysmography (rPPG). One architectural consideration, the depth of the model, can have significant consequenc...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Model architecture refinement is a challenging task in deep learning research fields such as remote photoplethysmography (rPPG). One architectural consideration, the depth of the model, can have significant consequences on the resulting performance. In rPPG models that are overprovisioned with more layers than necessary, redundancies exist, the removal of which can result in faster training and reduced computational load at inference time. With too few layers the models may exhibit sub-optimal error rates. We apply Centered Kernel Alignment (CKA) to an array of rPPG architectures of differing depths, demonstrating that shallower models do not learn the same representations as deeper models, and that after a certain depth, redundant layers are added without significantly increased functionality. An empirical study confirms how the architectural deficiencies discovered using CKA impact performance, and we show how CKA as a diagnostic can be used to refine rPPG architectures.
The lack of interpretability of the vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called Vis...
The lack of interpretability of the vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called vision DiffMask, which uses the activations of the model’s hidden layers to predict the relevant parts of the input that contribute to its final predictions. Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes. We demonstrate the faithfulness of our method, by introducing a faithfulness task, and comparing it to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K, achieving compelling results. To aid reproducibility and further extension of our work, we open source our implementation here.
Depth prediction is at the core of several computervision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers....
Depth prediction is at the core of several computervision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers. Unfortunately, the distribution of values on depth maps is seldom explored. Therefore, this paper proposes a novel framework combining contrastive learning and depth prediction, allowing us to pay more attention to depth distribution and consequently enabling improvements to the overall estimation process. Purposely, we propose a window-based contrastive learning module, which partitions the feature maps into non-overlapping windows and constructs contrastive loss within each one. Forming and sorting positive and negative pairs, then enlarging the gap between the two in the representation space, constraints depth distribution to fit the feature of the depth map. Experiments on KITTI and NYU datasets demonstrate the effectiveness of our framework.
Estimating uncertainty of a neural network is crucial in providing transparency and trustworthiness. In this paper, we focus on uncertainty estimation for digital pathology prediction models. To explore the large amou...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Estimating uncertainty of a neural network is crucial in providing transparency and trustworthiness. In this paper, we focus on uncertainty estimation for digital pathology prediction models. To explore the large amount of unlabeled data in digital pathology, we propose to adopt novel learning method that can fully exploit unlabeled data. The proposed method achieves superior performance compared with different baselines including the celebrated Monte-Carlo Dropout. Closeup inspection of uncertain regions reveal insight into the model and improves the trustworthiness of the models.
Recently, deep convolutional neural networks (DCNN) that leverage the adversarial training framework for image restoration and enhancement have significantly improved the processed images' sharpness. Surprisingly,...
详细信息
ISBN:
(纸本)9781665448994
Recently, deep convolutional neural networks (DCNN) that leverage the adversarial training framework for image restoration and enhancement have significantly improved the processed images' sharpness. Surprisingly, although these DCNNs produced crispier images than other methods visually, they may get a lower quality score when popular measures are employed for evaluating them. Therefore it is necessary to develop a quantitative metric to reflect their performances, which is well-aligned with the perceived quality of an image. Famous quantitative metrics such as Peak signal-to-noise ratio (PSNR), The structural similarity index measure (SSIM), and Perceptual Index (PI) are not well-correlated with the mean opinion score (MOS) for an image, especially for the neural networks trained with adversarial loss functions. This paper has proposed a convolutional neural network using an extension architecture of the traditional Siamese network so-called Siamese-Difference neural network. We have equipped this architecture with the spatial and channel-wise attention mechanism to increase our method's performance. Finally, we employed an auxiliary loss function to train our model. The suggested additional cost function surrogates ranking loss to increase Spearman's rank correlation coefficient while it is differentiable concerning the neural network parameters. Our method achieved superior performance in NTIRE 2021 Perceptual Image Quality Assessment Challenge. The implementations of our proposed method are publicly available.(1 2)
Large benchmarking datasets, such as ImageNet, COCO, Cityscapes, or ScanNet, have enormously promoted research in computervision. For the domain of crack segmentation, no such large and well-maintained benchmark exis...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Large benchmarking datasets, such as ImageNet, COCO, Cityscapes, or ScanNet, have enormously promoted research in computervision. For the domain of crack segmentation, no such large and well-maintained benchmark exists. Crack segmentation is characterized by the decentralized creation of datasets, almost all of which have their specific right to existence. Each dataset covers a different aspect of the surprisingly complex landscape of materials, acquisition conditions, and appearances linked to crack segmentation. The OmniCrack30k dataset forms the first large-scale, systematic, and thorough approach to provide a sustainable basis for tracking methodical progress in the field of crack segmentation. It contains 30k samples from over 20 datasets summing up to 9 billion pixels in total. Featuring materials as diverse as asphalt, ceramic, concrete, masonry, and steel, it paves the road towards universal crack segmentation, a currently under-explored topic. Experiments indicate the effectiveness of transfer learning for crack segmentation: nnU-Net achieves a mean clIoU4px of 64% outperforming all other approaches by at least 10% points.
The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International conference on patternrecognition (ICPR) 2016 and International conference on computervision...
详细信息
The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International conference on patternrecognition (ICPR) 2016 and International conference on computervision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.
Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird’s Eye View (BEV). In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance-and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://***/tteepe/TrackTacular.
The depth of deep convolution neural network and self-attention mechanism is widely used for the single image super-resolution (SISR) task. Nevertheless, we observed that the deeper network was more hard to train and ...
详细信息
ISBN:
(纸本)9781665448994
The depth of deep convolution neural network and self-attention mechanism is widely used for the single image super-resolution (SISR) task. Nevertheless, we observed that the deeper network was more hard to train and the self-attention mechanism is computationally consuming. Residual learning has been widely recognized as a common approach to improve network performance for deep learning, but most existing methods did not make the best of the learning ability of deep CNN, thus hindering the ability of representative CNN. In order to tackle these problems, we introduce a deep learning network namely expectation-maximization attention cross residual network (EACRN) to tackle the super-resolution task. Particularly, we propose a cross residual in cross residual (CRICR) structure that makes up very deep networks consisting of multiple cross residual groups (CRG) with global residual skip connections. Every cross residual group (CRG) consists of some cross residual blocks with cross short skip connections. At the same time, CRICR allows network focused on capturing high frequency patterns by connecting rich low frequency patterns to be bypassed and several short skip connections. In addition, we introduce various convolution kernel size so that adaptive capture the image pattern in different scales, which make these features get the more efficacious image information through interacting with each other. The introduced Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Extensive experiments demonstrate our EACRN obtains superior performance and visual effect relative to the most advanced algorithm.
Anomaly detection (AD) is a fundamental research problem in machine learning and computervision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In the field of medica...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Anomaly detection (AD) is a fundamental research problem in machine learning and computervision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In the field of medical imaging, AD plays a crucial role in identifying anomalies that may indicate rare diseases or conditions. However, despite its importance, there is currently a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To address this gap, we present a comprehensive evaluation benchmark for assessing AD methods on medical images. This benchmark consists of six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fifteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables researchers to easily compare and evaluate different AD methods, and ultimately leads to the development of more effective and robust AD algorithms for medical imaging. More information on BMAD is available in our GitHub repository: https://***/DorisBao/BMAD
1
暂无评论