Does progress on imageNet transfer to real-world datasets? We investigate this question by evaluating imageNet pre-trained models with varying accuracy (57% -83%) on six practical image classification datasets. In par...
详细信息
ISBN:
(纸本)9781713899921
Does progress on imageNet transfer to real-world datasets? We investigate this question by evaluating imageNet pre-trained models with varying accuracy (57% -83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. On multiple datasets, models with higher imageNet accuracy do not consistently yield performance improvements. For certain tasks, interventions such as data augmentation improve performance even when architectures do not. We hope that future benchmarks will include more diverse datasets to encourage a more comprehensive approach to improving learning algorithms.
Accurate recognition of intra-pulse modulation patterns is essential for enhancing radar system performance. Tranditional recognition algorithms are typically designed under ideal conditions and handcrafted features, ...
详细信息
Face morphing attacks have posed severe threats to Face Recognition systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are ...
详细信息
ISBN:
(纸本)9798350365474
Face morphing attacks have posed severe threats to Face Recognition systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from Vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.
This paper presents a novel image encryption algorithm that leverages the chaotic properties of the Chen system, the cryptographic strength of OpenSSL, and the mathematical robustness of the Fibonacci Q-Matrix. The pr...
详细信息
Interactive information fault diagnosis technology is a new type of fault diagnosis technology which is integrated by information fusion, artificial intelligence, computer science and other disciplines. It can extract...
详细信息
imageprocessing pipelines are ubiquitous and we rely on them either directly, by filtering or adjusting an image post-capture, or indirectly, as image signal processing (ISP) pipelines on broadly deployed camera syst...
详细信息
imageprocessing pipelines are ubiquitous and we rely on them either directly, by filtering or adjusting an image post-capture, or indirectly, as image signal processing (ISP) pipelines on broadly deployed camera systems. Used by artists, photographers, system engineers, and for downstream vision tasks, traditional imageprocessing pipelines feature complex algorithmic branches developed over decades. Recently, image-to-image networks have made great strides in imageprocessing, style transfer, and semantic understanding. The differentiable nature of these networks allows them to fit a large corpus of data;however, they do not allow for intuitive, fine-grained controls that photographers find in modern photo-finishing tools. This work closes that gap and presents an approach to making complex photo-finishing pipelines differentiable, allowing legacy algorithms to be trained akin to neural networks using first-order optimization methods. By concatenating tailored network proxy models of individual processing steps (e.g. white-balance, tone-mapping, color tuning), we can model a non-differentiable reference image finishing pipeline more faithfully than existing proxy image-to-image network models. We validate the method for several diverse applications, including photo and video style transfer, slider regression for commercial camera ISPs, photography-driven neural demosaicking, and adversarial photo-editing.
The rapid expansion of urban areas has intensified the challenge of finding parking spaces for drivers. Intelligent parking systems emerge as a crucial solution by providing real-time detection of available spaces. Wh...
详细信息
Color Filter Arrays (CFA) are essential components of digital cameras and image sensors to capture the color information needed to produce full-color images from only a single image sensor per pixel. Many methods and ...
详细信息
ISBN:
(纸本)9798350388787;9798350388770
Color Filter Arrays (CFA) are essential components of digital cameras and image sensors to capture the color information needed to produce full-color images from only a single image sensor per pixel. Many methods and algorithms have been proposed to recover the missing color information of CFAs. In this work, we use a simplified version of the Theshold-based Variable Number of Gradients algorithm proposed by Chang et al. to estimate the full-color information from Bayer images. We also show that the slight modification to algorithm does not effect images quality while making it more compatible with hardware. We propose an efficient implementation of the algorithm that reduces the number of calculations per pixel at the cost of increased memory resources. Our implementation targets an imageprocessing pipeline in an FPGA platform which is short on LUTs and FF resources but has DSPs and BRAMs to spare. We buffer the absolute differences and average color components to be shared and re-used between neighboring pixels, on two levels: within the same row, and between different rows. The latter strategy reduces the number of absolute differences calculated every cycle from 32 to 4 and average color components from 32 to 6. However, the memory requirements are increased from storing 4 image rows to 18 image rows. We implement the solutions on an FPGA using high-level synthesis (HLS) and optimize it to further reduce resources.
Industry 4.0, the digitalization of manufacturing promises to lead to lowered cost, efficient processes and even discovery of new business models. However, many of the enterprises have huge investments in legacy machi...
详细信息
ISBN:
(纸本)9783031702587;9783031702594
Industry 4.0, the digitalization of manufacturing promises to lead to lowered cost, efficient processes and even discovery of new business models. However, many of the enterprises have huge investments in legacy machines which are not 'smart'. In this study, we thus designed a cost-efficient solution to retrofit a legacy conveyor belt-based cutlery washing machine with a commodity web camera. We then applied computer vision (using both traditional imageprocessing and deep learning techniques) to infer the speed and utilization of the machine. We detailed the algorithms that we designed for computing both speed and utilization. With the existing operational constraints of our client, frequent re-training of the deep learning model for object detection is not feasible. Thus, we compared the generalizability of the two techniques across 'unseen' cutleries and found traditional imageprocessing to be generalizable across 'unseen' images. Our proposed final solution uses traditional imageprocessing for computation of utilization but a hybrid of traditional imageprocessing and deep learning model for speed computation as it is more reliable. Our client has implemented our proposed solution for one conveyor belt-based cutlery washing machine and will be planning to scale this to multiple conveyor belt-based cutlery washing machines.
Improved fuzzy c-means (FCM) clustering algorithms have been widely used for image recognition and localization. However, in industrial assembly systems, the unsatisfactory pixel merging and segmentation results betwe...
详细信息
暂无评论