Applications are growing for ultracompact millimeter-scale cameras. For color images, these sensors commonly uti-lize a Bayer mask, which can negatively and perceptibly have an impact on image resolution and quality, ...
详细信息
Applications are growing for ultracompact millimeter-scale cameras. For color images, these sensors commonly uti-lize a Bayer mask, which can negatively and perceptibly have an impact on image resolution and quality, especially for low pixel-count submillimeter sensors. To alleviate this, we built a time-multiplexed RGB LED illumination system synchronized to the rolling shutter of a monochrome camera. The sequential images are processed and displayed as near real-time color video. Experimental comparison with an identical sensor with a Bayer color mask showed significant improvement in the MTF curves and to perceived image clarity. Trade-offs with respect to system complexity and color motion artifacts are discussed.
Cutting-edge medical image analysis, driven by quantum-based techniques, offers automated information extraction from images, revolutionizing health care. Traditional methods are being outpaced by the demand for advan...
详细信息
Cutting-edge medical image analysis, driven by quantum-based techniques, offers automated information extraction from images, revolutionizing health care. Traditional methods are being outpaced by the demand for advanced real-time digital imageprocessing. This article introduces an innovative approach to medical image edge detection based on entropy. In recent years, various quantum representation models have emerged, addressing the complex nature of medical images characterized by dark backgrounds and low contrast. To enhance image quality, the article introduces the novel enhanced quantum representation model, which leverages the colour operations of Caraiman's quantum image representation model to improve the greyscale values of individual pixels. However, the article acknowledges that quantum noise remains a challenge in imageprocessing due to statistical fluctuations in medical imaging. To combat this, the article introduces a neural network-based hybrid filter, comprising neural edge enhancers and bilateral filters. The neural filter acts as a fusion operator, effectively eliminating quantum noise from the output image. Another challenge addressed in this work is the time complexity of edge detection. The article presents a novel methodology for edge extraction based on Hill entropy for medical images, which involves segmenting the image into objects and backgrounds using a threshold value. This method aims to reduce computation time while producing high-quality edge detection. The proposed algorithm is implemented using MATLAB software and evaluated on various images. The results demonstrate the algorithm's effectiveness, with a notably higher peak signal-to-noise ratio of 41.5312%, a lower mean square error of 0.0214%, and an improved contrast-to-noise ratio of 42.59%. These outcomes underscore the algorithm's superior performance in edge detection for medical images, offering a remarkable accuracy of 97.5% compared to traditional methods.
image tamper localization is an important research topic in the field of computer vision, which aims at identifying and localizing human-modified regions in images. In this paper, we propose a new image tampering loca...
详细信息
image tamper localization is an important research topic in the field of computer vision, which aims at identifying and localizing human-modified regions in images. In this paper, we propose a new image tampering localization network, which is named MAPS-Net. It combines the advantages of efficient multi-scale attention, shift operation, and progressive subtraction, which not only improves the sensitivity and generalization to novel data tampering behaviors but also significantly reduces the computation time. MAPS-Net consists of upper and lower branches, which are the fake edge-enhancing branch and the interfering factors-weakening branch. The fake edge-enhancing branch uses an efficient multi-scale edge residual module to enhance the expressiveness of the features, while the interfering factors-weakening branch uses progressive subtraction to weaken the interference of image content fluctuations in capturing general tampering behaviors. Finally, the features of both branches are fused with a position attention mechanism via a shift operation to capture the spatial relationships between different views. Experiments conducted on several publicly available datasets show that MAPS-Net outperforms existing mainstream models in both image tampering detection and localization, especially in image tampering localization in real scenes. Code is available at: https://***/dklive1999/MAPS-Net.
In this study, we propose a novel concept of a software-based fingertip velocimeter using high-frame-rate (HFR) videoprocessing that can simultaneously estimate when and where an operator taps with his/her finger by ...
详细信息
In this study, we propose a novel concept of a software-based fingertip velocimeter using high-frame-rate (HFR) videoprocessing that can simultaneously estimate when and where an operator taps with his/her finger by detecting the high-frequency component that develops when the fingertip actively contacts something. Our softwarebased fingertip velocimeter can precisely estimate the velocities of multiple fingers through HFR videoprocessing in realtime. Digital image correlation (DIC) operating at every frame for sub-pixel-precision velocity estimation is hybridized with convolution neural network (CNN)-based object detection operating at intervals of dozens of frames to robustly update the fingertip ROI regions during the frame-by-frame DIC operation. We developed a real-time multifinger tapping detection system that can execute DIC operation on 720x540 resolution images at 500 frames/s with CNN-based fingertip detection at 30 frames/s. By presenting several experimental results for finger tapping detection, including virtual keyboard interaction with a ten-finger keyboard input, the effectiveness of our fingertip velocimeter as a finger tapping interface was demonstrated, which can simultaneously estimate the tapping positions and moments of multiple fingers when finger tapping is performed ten times or more in a second.
Referring video Object Segmentation (R-VOS) is a challenging task that involves segmenting objects in a video based on linguistic descriptions. In this paper, we introduce a novel multi-granularity referring video Obj...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Referring video Object Segmentation (R-VOS) is a challenging task that involves segmenting objects in a video based on linguistic descriptions. In this paper, we introduce a novel multi-granularity referring video Object segmentation framework, termed as LUMINATE. The LUMINATE framework introduces a streamlined approach to cross-modal fusion. The proposed LUMINATE enhanced interaction between visual and textual modalities begins with cross-attention between the vision encoder's query and the text encoder's key-value pairs, and vice versa. The results are then concatenated with the respective queries of the vision and text encoders, fostering a comprehensive understanding of semantic relationships. The combined features are fed into the Transformer Encoder for further refinement and integration into the segmentation pipeline. Extensive experiments on benchmark datasets, including Ref-DAVIS, demonstrate that our proposed LUMINATE approach achieves better results than state-of-the-art methods in terms of Jaccard and F-measure evaluation metrics. Furthermore, the efficiency of our multi-object R-VOS variant is highlighted, achieving a threefold speed improvement while maintaining satisfactory segmentation performance. The proposed approach contributes to advancing the capabilities of R-VOS models, paving the way for improved multimodal reasoning and real-world applications.
Aiming at the difficulty of recognising the smoking and making phone calls behaviours of people in the complex background of construction sites, a method of recognising human elbow flexion behaviour based on posture e...
详细信息
Aiming at the difficulty of recognising the smoking and making phone calls behaviours of people in the complex background of construction sites, a method of recognising human elbow flexion behaviour based on posture estimation is proposed. The human upper body key points needed are retrained based on AlphaPose to achieve human object localization and key points detection. Then, a mathematical model for human elbow flexion behaviour discrimination (HEFBD model) is proposed based on human key points, as well as locating the region of interest for small object detection and reducing the interference of complex background. A super-resolution image reconstruction method is used for pre-processing some blurred images. In addition, YOLOv5s is improved by adding a small object detection layer and integrating a convolutional block attention model to improve the detection performance. The detection precision of this method is improved by 5.6%, and the false detection rate caused by complex background is reduced by 13%, which outperforms other state-of-the-art detection methods and meets the requirement of real-time performance.
This study presents a design of two-dimensional (2D) discrete cosine transform (DCT) architecture to be used with high-efficiency video coding (HEVC) intra-prediction method in image compression. Since the amount of c...
详细信息
This study presents a design of two-dimensional (2D) discrete cosine transform (DCT) architecture to be used with high-efficiency video coding (HEVC) intra-prediction method in image compression. Since the amount of calculation required by the transform step in HEVC is high and accordingly the power consumption is high, a novel DCT architecture for HEVC is proposed to reduce this calculation complexity and power consumption. This architecture is based on erroneous calculations in the steps, which can be ignored in the quantizing step. For this purpose, approximate 5:3 compressor circuits with different error rates are designed and used instead of addition/subtraction in DCT architecture. This DCT architecture is designed to support 4 x 4, 8 x 8, 16 x 16 and 32 x 32 transform blocks. The designed architecture is performed on FPGA and experiments are conducted. In these experiments, hardware performance parameters are examined, and it is proved that the use of approximate compressor can provide advantages on power consumption and physical area. The efficiency of the proposed architecture is investigated by performing image compression and video coding tests.
Moving object detection plays a significant role in video surveillance. However, existing moving object detection methods often rely on software implementations, which means low real-time performance and high power co...
详细信息
ISBN:
(纸本)9798331540050;9798331540043
Moving object detection plays a significant role in video surveillance. However, existing moving object detection methods often rely on software implementations, which means low real-time performance and high power consumption. This paper's core detection algorithm employs a frame difference method that is enhanced by morphological filtering. Additionally, we propose an architecture that integrates FPGA(Field Programmable Gate Array) and ARM(Advanced RISC Machine), fully leveraging the parallel computing advantages of FPGA and the high processing efficiency of ARM. The system utilizes a ZYNQ7000 SoC, coupled with an OV7725 camera for image capture and DDR3 SDRAM for data caching, to address the challenges of high-speed data processing and low power consumption. Experimental results show that the system meets the requirements for high real-time performance and low power consumption with a frame rate of 85.9375 frames per second and a total power consumption of 1.101 W.
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
This paper proposes a performance comparison of throughput between context-based adaptive binary arithmetic decoding (CABAC) processes adopted in the three recent video codecs: advanced video coding (AVC), high effici...
详细信息
This paper proposes a performance comparison of throughput between context-based adaptive binary arithmetic decoding (CABAC) processes adopted in the three recent video codecs: advanced video coding (AVC), high efficiency video coding (HEVC), and versatile video coding (VVC). Consequently, in order to highlight the performance and the modification in three CABAC versions: the three main stages of CABAC decoding Context Selection and Modeling (CSM), Binary Arithmetic Decoding (BAD) and De-binarization (DBZ) are designed, described in VHDL language and implemented on Field Programmable Gate Array (FPGA) device. Firstly, the most efficient CSM is obtained for CABAC VVC with maximum frequency of 183.8 MHz and low power consumption of 0.346 mW. Secondly, the BAD in RM is modified only in the last video standard VVC. The most efficient design of BAD RM is given in the AVC and HEVC version of CABAC with maximum frequency of 261.75 MHz. Thirdly, the BAD in BM and TM are the same adopted in the three CABAC version, with maximum frequencies of 439.657 MHz and 798.861 MHz, respectively. Thirdly, the de-binarization codes are also the same adopted in the three last CABAC versions. Consequently, high frequency of 789.26 MHz is obtained in DBZ but the resources cost and power consumption are greater than that given in CSM and BAD stages. Finally, high throughput of 178.13 bins/s is given by our proposed design of VVC CABAC decoder.
暂无评论