Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are train...
详细信息
Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.
Blind image watermarking is regarded as a vital technology to provide copyright of digitalimages. Due to the rapid growth of deep neural networks, deep learning-based watermarking methods have been widely studied. Ho...
详细信息
Blind image watermarking is regarded as a vital technology to provide copyright of digitalimages. Due to the rapid growth of deep neural networks, deep learning-based watermarking methods have been widely studied. However, most existing methods which adopt simple embedding and extraction structures cannot fully utilize the image features. In this paper, we propose a novel Single-Encoder-Dual-Decoder (SEDD) watermarking architecture to achieve high imperceptibility and strong robustness. Precisely, the single encoder utilizes normalizing flow to realize watermark embedding, which can effectively fuse the watermark and cover image. For watermark extraction, we introduce a parallel dual-decoder to improve the imperceptibility and extracting ability. Extensive experiments demonstrate that better watermark robustness and imperceptibility are obtained by SEDD architecture. Our method achieves a bit error rate less than 0.1% under most attacks such as JPEG compression, Gaussian blur and crop. Besides, the proposed method also obtains strong robustness under combined attacks and social platform processing.
This study explores the integration of memristor crossbar as filter arrays for imageprocessing.applications exploit the various filtering techniques. Memristor crossbar arrays offer a promising platform for parallel ...
详细信息
ISBN:
(纸本)9798350386257;9798350386240
This study explores the integration of memristor crossbar as filter arrays for imageprocessing.applications exploit the various filtering techniques. Memristor crossbar arrays offer a promising platform for parallel processing.and efficient implementation of filtering operations due to their dense and scalable architecture. Configuring each column in the crossbar array to act as a filter, it becomes possible to perform multiple filtering operations simultaneously on input images. This research investigates the feasibility and performance of utilizing memristor crossbar arrays as filter arrays with different filter structures and random dropouts in imageprocessing. This analysis focusing on the potential of memristor based reconfigurable filter arrays in advancing the field of imageprocessing.
Qilou (arcade building) is a particular type of Chinese historical architecture combined with western and eastern building elements, which plays a significant role in the history of modern Chinese architecture. Howeve...
详细信息
Qilou (arcade building) is a particular type of Chinese historical architecture combined with western and eastern building elements, which plays a significant role in the history of modern Chinese architecture. However, the recognition and classification of the qilou mainly rely on manual inspection, suppressing the cultural dissemination and protection of qilou relics. In this paper, we present a new framework that adopts multiple imageprocessing.algorithms and a deep learning network to automate qilou classification. First, image dataset of the qilou is enhanced based on the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. Then, an improved Faster R-CNN with ResNet50 (Faster R-CNN-R) is deployed for qilou image recognition. A total of 760 images captured in Guangzhou were used for training, validation, and accuracy check of the proposed framework and several contrastive networks under the same conditions. Compared to other networks, the proposed framework works better than Faster R-CNN with VGG16 (Faster R-CNN-V) and FCOS. The accuracy of the proposed framework embedded with the Faster R-CNN-R, Faster R-CNN-V, and FCOS are 80.12%, 65.17%, and 66.35%, respectively. Based on digitalimages captured under different lighting conditions, the proposed framework can be used to classify nine different types of qilous, with high robustness.
This paper explores the utilization of MATLAB for digital signal processing.(DSP) techniques in imageprocessing.tasks, focusing on image deblurring, face detection, and facial feature enhancement. Blind deconvolution...
详细信息
ISBN:
(纸本)9798350372113;9798350372106
This paper explores the utilization of MATLAB for digital signal processing.(DSP) techniques in imageprocessing.tasks, focusing on image deblurring, face detection, and facial feature enhancement. Blind deconvolution methods are employed to address image blurriness, while face detection is facilitated using cascaded object detectors. Enhancements to detected facial features involve histogram equalization, smoothing filters, skin tone adjustment, and contrast enhancement techniques, followed by seamless integration using resizing methods. MATLAB serves as a robust platform for implementing and analyzing DSP algorithms, providing insights into practical solutions for common challenges in digitalimageprocessing.
This paper proposes ReAdapt-a reconfigurable datapath architecture for scaling the energy-quality trade-off of adaptive filtering at runtime. The ReAdapt can dynamically select four adaptive filtering algorithms for g...
详细信息
This paper proposes ReAdapt-a reconfigurable datapath architecture for scaling the energy-quality trade-off of adaptive filtering at runtime. The ReAdapt can dynamically select four adaptive filtering algorithms for gradating complexity levels during runtime by reconfiguring the processing.flow in its datapath and by blocking the switching activity (e.g., reducing the CMOS dynamic power) of unused modules with data-gating. The ReAdapt proposal can scale the energy-quality trade-off by choosing the following four different levels of filter algorithms complexity: 1) least mean square (LMS);2) partial update normalized LMS (PU-NLMS);3) set-membership normalized LMS (SM-NLMS);4) normalized LMS (NLMS). The ReAdapt architecture reuses common modules of each adaptive filter, resulting in a compact VLSI hardware implementation. The ReAdapt architecture operation is implemented in a case-study for interference mitigation for electroencephalogram (EEG) signal processing. The hardware synthesis results show an increase of 6.80 times in throughput and at least a reduction of 2.84 times in energy per operation compared with the state-of-the-art adaptive filters. This paper also investigates the benefits of dynamically reconfiguring the four ReAdapt operating modes at runtime for different levels of signal-to-noise ratio (SNR) for the processed signals. We also demonstrate that dynamically reconfiguring the ReAdapt operating modes during runtime results in an optimal energy-quality trade-off which is advantageous over the conventional single static mode.
imageprocessing.is a vigorous area of study that utilizes various algorithms to manipulate, analyze, and enhance digitalimages. image denoising is one of the crucial applications of imageprocessing. Still, the occu...
详细信息
ISBN:
(纸本)9783031686528;9783031686535
imageprocessing.is a vigorous area of study that utilizes various algorithms to manipulate, analyze, and enhance digitalimages. image denoising is one of the crucial applications of imageprocessing. Still, the occurrence of image noise is inevitable due to various sources, including low light conditions, high ISO settings, and transmission artifacts, necessitating the availability of denoising techniques to significantly improve visual image quality. This is particularly important in fields such as computer vision, medical imaging and remote sensing. Not only does it facilitate image analysis by retaining important details, but it also optimizes the performance of compression algorithms, improves storyteller detection. In this project, we propose an in-depth study of image denoising, focusing on the use of convolutional neural networks (CNNs). The problem of Gaussian noise will be treated by applying different levels of s (low sigma = 15, medium sigma = 25, and high sigma = 50). During this project, a full comparative analysis will be made with the three mainCNNarchitectures: DnCNN, RIDNet, and IRCNN, illustrative of the quantitative and qualitative experimental results obtained by these different approaches. In fact, these approaches have shown impressive performance in imageprocessing.tasks, including image denoising, since they used different techniques that can be adopted in CNN, such as regularization methods, batch normalization, and residual learning.
Fast and resource-efficient inference in artificial neural networks (ANNs) is of utmost importance and drives many new developments in the area of new hardware architectures, e.g., by means of systolic arrays or algor...
详细信息
Fast and resource-efficient inference in artificial neural networks (ANNs) is of utmost importance and drives many new developments in the area of new hardware architectures, e.g., by means of systolic arrays or algorithmic optimization such as pruning. In this paper, we present a novel method for lowering the computation effort for ANN inference utilizing ideas from information theory. Weight matrices are sliced into submatrices of logarithmic aspect ratios. These slices are then factorized. This reduces the number of required computations without compromising on fully parallel processing. We create a new hardware architecture for this dedicated purpose. We also provide a tool to map these sliced and factorized matrices efficiently to reconfigurable hardware. By comparing to the state of the art FPGA implementations, we can prove our claim by lowering hardware resources measured in look-up-tables (LUTs) by a factor of three to six. Our method does not rely on any particular property of the weight matrices of the ANN. It works for the general task of multiplying an input vector with a constant matrix and is also suitable for digital signal processing.beyond ANNs.
Computer Vision (CV) leverages artificial intelligence to analyse digitalimages, offering insights for a wide range of different applications. While CV software often relies on open-source libraries such as OpenCV, i...
详细信息
Computer Vision (CV) leverages artificial intelligence to analyse digitalimages, offering insights for a wide range of different applications. While CV software often relies on open-source libraries such as OpenCV, it is probably more common for this software to use custom codes. Creating particular solutions stems from the very nature of the specific CV problems being addressed but, despite these particularities, there are common links at the core that are either not addressed by generic CV libraries or require significant customisation for specific applications. Understanding the nature of real problems faced by a digitalimage analysis use case can contribute as much as solving a generic CV problem, and this is the aim of this paper. This article addresses the problem of migrating to CUDA a part of Multiscan Vision System, a complex CV workflow utilised in a real-world, industrial use case. The primary challenge lies in minimising the overhead due to data transfers between the host and GPU (graphics processing.unit), or even within the device's memory itself. While the speed-up achieved may not rival that of other applications more suitable to GPU architecture (in particular, massively data parallel applications), the algorithms and data distribution proposed in this study effectively offload a substantial portion of the workflow to the GPU in the context of low (integer) arithmetic intensity and real-time constraints. This frees the CPU to handle other workflow components and increases the capability to incorporate more cameras, significantly boosting productivity and economic performance.
This paper provides an estimation evaluation of the edge and line detection methods in virtual photo processing. This analysis evaluates the performance of numerous facet and line detection algorithms in phrases of fa...
This paper provides an estimation evaluation of the edge and line detection methods in virtual photo processing. This analysis evaluates the performance of numerous facet and line detection algorithms in phrases of facet/line first-rate, accuracy, computational complexity, and robustness to noise in the context of artificial and actual-international snapshots. Mainly, it discusses the overall performance of 4 edge detection methods: Roberts, Prewitt, Sobel, and Canny; and 3 line detection techniques: Hough remodel, Linear Estimation, and Probabilistic Hough remodel. as compared to the classical algorithms, the Probabilistic Hough rework algorithm is determined to have the excellent accuracy and robustness to noise. Furthermore, a contrast of computational complexities shows that the Hough transform has the bottom complexity and computational time, even as the Linear Estimation algorithm has the best complexity and computational time. The primary outcome of this study is that the brink/line first-rate, accuracy, computational complexity, and robustness are very dependent on the sort of entered snapshots and the respective selected parameters.
暂无评论