Context. Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (A...
详细信息
Context. Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (ADI) datasets. The benchmarking of HCI algorithms within the Exoplanet Imaging Data Challenge (EIDC) showed that (i) SODINN can produce a high number of false positives in the final detection maps, and (ii) algorithms processingimages in a more local manner perform better. Aims. This work aims to improve the SODINN detection performance by introducing new local processing approaches and adapting its learning process accordingly. methods. We propose NA-SODINN, a new deep learning binary classifier based on a convolutional neural network (CNN) that better captures image noise correlations in ADI-processed frames by identifying noise regimes. The identification of these noise regimes is based on a novel technique, named PCA-pmaps, which allowed us to estimate the distance from the star in the image from which background noise started to dominate over residual speckle noise. NA-SODINN was also fed with local discriminators, such as signal-to-noise ratio (S/N) curves, which complement spatio-temporal feature maps during the model's training. Results. Our new approach was tested against its predecessor, as well as two SODINN-based hybrid models and a more standard annular-PCA approach, through local receiving operating characteristics (ROC) analysis of ADI sequences from the VLT/SPHERE and Keck/NIRC-2 instruments. Results show that NA-SODINN enhances SODINN in both sensitivity and specificity, especially in the speckle-dominated noise regime. NA-SODINN is also benchmarked against the complete set of submitted detection algorithms in EIDC, in which we show that its final detection score matches or outperforms the most powerful detection algorithms. Conclusions. Throughout the supervised machine learning case, this study illustrates and reinforces the importanc
The "Residual-to-Residual DNN series for high-Dynamic range imaging" (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a seri...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
The "Residual-to-Residual DNN series for high-Dynamic range imaging" (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate "R2D2 samples", from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms;(ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension;(iii) it is characterized by a very low model uncertainty.
The need for automated systems to aid law enforcement during densely packed events arises from the inherent danger of large crowds, evidenced by historical instances of stampedes and crushes. Existing methods vary fro...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
The need for automated systems to aid law enforcement during densely packed events arises from the inherent danger of large crowds, evidenced by historical instances of stampedes and crushes. Existing methods vary from basic crowd statistics extraction to detailed anomaly detection in behavior classification, but often focus on single, pre-segmented scenes. Our work addresses classifying crowd behaviors in environments where multiple behaviors coexist within a single scene, defined as a multi-class crowd motion characterization challenge. We use a microscopic approach for scenes captured by drones at varying altitudes, without prior manipulation. This approach combines graph-based representations of individuals and flow images, facilitating classification of diverse crowd behaviors in unsegmented scenes. Tested on a public dataset, our method shows promising results in analyzing complex crowd dynamics.
Perceptual quality metrics derived from deep features have led to a boost in modelling the Human Visual System (HVS) to perceive the quality of visual content. In this work, we study the effectiveness of fine-tuning t...
详细信息
ISBN:
(纸本)9798350350463;9798350350456
Perceptual quality metrics derived from deep features have led to a boost in modelling the Human Visual System (HVS) to perceive the quality of visual content. In this work, we study the effectiveness of fine-tuning three standard convolutional neural networks (CNNs) viz. ResNet50, VGG16 and MobileNetV2 to predict the quality of stereoscopic images in the no-reference setting. This work also aims to understand the impact of using disparity maps for quality prediction. Interestingly, our experiments demonstrate that disparity maps do not significantly contribute to improving perceptual quality estimation in the deep learning framework. To the best of our knowledge, this is the first study that explores the impact of disparity along with the chosen models for Stereoscopic image Quality Assessment. We present a detailed study of our experiments with various architectural configurations on the LIVE Phase I and ii datasets. Further, our results demonstrate the innate capability of deep features for quality prediction. Finally, the simple fine-tuning of the models results in solutions that compete with state-of-the-art patch-based stereoscopic image quality assessment methods.
As artificial intelligence (AI) technology advances, Internet of Things (IoT) devices, such as mobile phones and augmented reality devices, are increasingly becoming crucial enablers of user-device interactions. Among...
详细信息
As artificial intelligence (AI) technology advances, Internet of Things (IoT) devices, such as mobile phones and augmented reality devices, are increasingly becoming crucial enablers of user-device interactions. Among the various methods of interaction, hand pose recognition and analysis is a crucial method to understand the intentions of users and perform precise functions. However, to perform such functions, a substantial amount of computation and resources are required, making it challenging to implement them on small form-factor devices with low-power consumption. For this reason, improving energy efficiency is a crucial objective in real-time hand pose estimation (HPE) applied to low-power platforms with limited resources. In this article, we introduce an FPGA-based energy-efficient real-time HPE system with an integrated imagesignal processor (ISP). The proposed system uses several low-power design techniques, including a systolic array with dynamic on/off control per processing element (PE), to minimize power consumption and save energy when not in use. In addition, we improve area efficiency by reducing the buffer size in the systolic array using a half-size shift buffer stack. Furthermore, the use of parallel and pipelined structures improved operational efficiency, resulting in a reduction in both operational time and power consumption. The evaluation results on a KU115 FPGA board show that the system achieves an error of 7.78 mm and can process 52 fps, demonstrating its capability for real-time HPE. Moreover, this system achieves high-energy efficiency, up to 61.74 GOPs/W, making it suitable for energy-efficient and accurate HPE in low-power environments.
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to eluc...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to elucidate their decision-making processes. The rise in popularity of XAI has led to the advent of different strategies to produce explanations, all of which only occasionally agree. Thus several objective evaluation metrics have been devised to decide which of thesemodules give the best explanation for specific scenarios. The goal of the paper is twofold: (i) we employ the notions of necessity and sufficiency from causal literature to come up with a novel explanatory technique called SHifted Adversaries using Pixel Elimination(SHAPE) which satisfies all the theoretical and mathematical criteria of being a valid explanation, (ii) we show that SHAPE is, infact, an adversarial explanation that fools causal metrics that are employed to measure the robustness and reliability of popular importance based visual XAI methods. Our analysis shows that SHAPE outperforms popular explanatory techniques like GradCAM and GradCAM++ in these tests and is comparable to RISE, raising questions about the sanity of these metrics and the need for human involvement for an overall better evaluation.
The steganalysis of JPEG images is a crucial area of research. Deep-learning based steganalysis methods have achieved superior detection performance. All methods for JPEG steganalysis rely on residual networks. Althou...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The steganalysis of JPEG images is a crucial area of research. Deep-learning based steganalysis methods have achieved superior detection performance. All methods for JPEG steganalysis rely on residual networks. Although the incorporation of residual connections has enhanced detection performance, it has also led to a notable increase in computational complexity. Furthermore, most of these methods are not complete end-to-end models. In their approaches, traditional hand-crafted filters are employed for image preprocessing. To avoid relying on residual connections and prior knowledge, we propose an end-to-end VGG-style ConvNet. During training, the model utilizes a multi-branch architecture, while it is transformed into a VGG-style ConvNet through structural reparameterization during inference. We conduct extensive experiments on ALASKA KAGGLE dataset and ALASKA ii dataset, demonstrating that the proposed method achieves state-of-the-art results in the JPEG domain comparable to other CNN-based steganalyzers such as UCNet and EfficientNet, with clearly better convergence capacity and lower model complexity.
This study aims to explore deep learning-based image target recognition methods to improve the performance of target detection and classification in the field of computer vision. The experiments use satellite-acquired...
详细信息
This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of t...
详细信息
This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization;(ii) stochastic precision;and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a network with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bitwidth from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, imageNet) show the effectiveness of the proposed methods.
In this paper, we propose a method to refine depth maps estimated by Multi-View Stereo (MVS) with neural Radiance Field (NeRF) optimization to estimate depth maps from multi-view images with high accuracy. MVS estimat...
详细信息
ISBN:
(纸本)9781728198354
In this paper, we propose a method to refine depth maps estimated by Multi-View Stereo (MVS) with neural Radiance Field (NeRF) optimization to estimate depth maps from multi-view images with high accuracy. MVS estimates the depths on object surfaces with high accuracy, and NeRF estimates the depths at object boundaries with high accuracy. The key ideas of the proposed method are (i) to combine MVS and NeRF to utilize the advantages of both in depth map estimation, (ii) not to require any training process, therefore no training dataset and ground truth are required, and (iii) to use NeRF for depth map refinement. Through a set of experiments using the Redwood-3dscan dataset, we demonstrate the effectiveness of the proposed method compared to conventional depth map estimation methods.
暂无评论