Computer vision research uses self-driving systems, robot surveillance, and science interpretation. A plethora of applications, including robotics, self-driving systems, video surveillance, and scene interpretation, h...
详细信息
In recent years, deep learning has been successfully applied in medical images due to its ability to learn high complex and multidimensional data. However, it is known that deep learning models are vulnerable to adver...
详细信息
ISBN:
(纸本)9798350343557
In recent years, deep learning has been successfully applied in medical images due to its ability to learn high complex and multidimensional data. However, it is known that deep learning models are vulnerable to adversarial machine learning attacks, which add small imperceptible perturbation to the legitimate input image, causing the models to produce incorrect results. In this study, Fast Gradient Sign Method is applied to vision Transformer, a basic pre-trained transformer-based model, which is used for binary classification of a publicly available skin lesion dataset and the robustness of the model is analyzed. Then, the adversarial training approach is used to improve the robustness of the model against adversarial attacks. The experimental results show that the classification accuracy is reduced from 90.1% to 27.38% even for a small perturbation, and the adversarial training approach increases the model's robustness with an accuracy value of 96.61%.
Facial recognition is a widely-used process that aims to detect and verify an individual's identity. This technique is employed in various applications, such as image and video analysis, surveillance, and security...
详细信息
In various fields such as medical imaging, object detection, and video surveillance, multi view natural language query systems utilize image data to provide a more comprehensive perspective, allowing users to intuitiv...
详细信息
This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than ...
详细信息
ISBN:
(纸本)9784885523434
This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.
In today's ever-changing world, the ability of machine learning models to continually learn new data without forgetting previous knowledge is of utmost importance. However, in the scenario of few-shot class-increm...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
In today's ever-changing world, the ability of machine learning models to continually learn new data without forgetting previous knowledge is of utmost importance. However, in the scenario of few-shot class-incremental learning (FSCIL), where models have limited access to new instances, this task becomes even more challenging. Current methods use prototypes as a replacement for classifiers, where the cosine similarity of instances to these prototypes is used for prediction. However, we have identified that the embedding space created by using the relu activation function is incomplete and crowded for future classes. To address this issue, we propose the Expanding Hyperspherical Space (EHS) method for FSCIL. In EHS, we utilize an odd-symmetric activation function to ensure the completeness and symmetry of embedding space. Additionally, we specify a region for base classes and reserve space for unseen future classes, which increases the distance between class distributions. Pseudo instances are also used to enable the model to anticipate possible upcoming samples. During inference, we provide rectification to the confidence to prevent bias towards base classes. We conducted experiments on benchmark datasets such as CIFAR100 and miniimageNet, which demonstrate that our proposed method achieves state-of-the-art performance.
With the continuous progress of imageprocessing and machinevision technology, the demand for efficient and real-time processing is becoming more and more prominent, especially in the field of high-noise image proces...
详细信息
ISBN:
(纸本)9798350377040;9798350377033
With the continuous progress of imageprocessing and machinevision technology, the demand for efficient and real-time processing is becoming more and more prominent, especially in the field of high-noise imageprocessing. In this study, an adaptive Gaussian filtering algorithm is proposed, which is implemented based on FPGA and aims to improve the computational efficiency and real-time performance of the imageprocessing system. Compared with the traditional fixed-weight filter, this algorithm is able to dynamically adjust the filtering parameters according to different noise environments, effectively balancing noise suppression and image detail retention. We coded the algorithm using Verilog hardware description language and verified it on PYNQ-Z2 FPGA platform. The experimental results show that the adaptive algorithm outperforms the fixed-weight filtering method in terms of performance, especially in terms of noise suppression and detail preservation. Meanwhile, the FPGA hardware implements the reduction of filtering delay and optimization of resource consumption, making it well suited for real-time applications. This study demonstrates the promise of FPGA adaptive filtering for applications in medical imaging, remote sensing, and intelligent surveillance, which have stringent requirements for high-performance and high-efficiency processing. This research provides new hardware solutions for real-time, high-quality imageprocessing in constrained environments.
Speckle field is one of the most information-rich light fields related to plentiful physical characteristics at present that can be used to provide high-resolution surface topography information or applied to image re...
详细信息
ISBN:
(纸本)9798400716553
Speckle field is one of the most information-rich light fields related to plentiful physical characteristics at present that can be used to provide high-resolution surface topography information or applied to image reconstruction, image enhancement and other fields. However, the most study of speckle image recovery for monochromatic wavelength ignores a large amount of real object information. In this paper, the amplitude information of colored speckle recovery in optical imaging is studied, as might be seen if red, green, and blue lasers illuminate a rough surface with different reflectivity at these three wavelengths. We derived the expression for color speckle distribution and designed an imaging system with a pupil stop, normal plane-wave incidence on the diffuser, and a camera to observe the colored speckled image. In order to analyze the simulation experiment, two aspects are studied: phase shift and average speckle size. The results show that more characteristic information is recovered from the colored speckle image.
In this study, we investigate the Deep image Prior (DIP) in enhancing image smoothing, a crucial component in numerous computer vision and graphics applications. Although deep learning has demonstrated remarkable achi...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
In this study, we investigate the Deep image Prior (DIP) in enhancing image smoothing, a crucial component in numerous computer vision and graphics applications. Although deep learning has demonstrated remarkable achievements in these domains, it often falls short in flexibility and controllability, in contrast to traditional methods, which are more adaptable and typically exhibit subpar performance. Notably, some end-to-end deep learning models offer control over edge preservation, yet their performance remains marginally suboptimal. To address this shortcoming, we introduce an innovative network architecture that diverges from the traditional U-Net model, featuring a Laplacian pyramid as the encoder and a deep decoder as the decoding component, integrated with a bilateral filter loss to improve DIP. This design aids the network in rapidly assimilating essential low-frequency information. Our approach excels in retaining texture details, significantly improving image smoothing and related tasks beyond the capabilities of standard DIP methods. Moreover, our technique outperforms the leading unsupervised method, pyramid texture filtering, in texture filtering tasks and other applications.
Infrared-visible image fusion combines complementary information from both modalities, enhancing scene perception in applications such as surveillance and autonomous driving. However, existing deep learning-based meth...
详细信息
暂无评论