In this digital world, technology mainly revolves around multimedia content, which is massive. The existing techniques are inadequate to capture the movement of the human upper body parts in real-time. Therefore, a sy...
详细信息
In this digital world, technology mainly revolves around multimedia content, which is massive. The existing techniques are inadequate to capture the movement of the human upper body parts in real-time. Therefore, a system is urgently required to track the human upper body parts movements from image and video content. The Viola-Jones algorithm based on a novel technique is proposed to detect upper body parts like the eyes and face in this work. Moreover, a face tracking algorithm is employed to track the movement of the upper body parts in real-time. The proposed model improves 13%, 18%, and 10% accuracy for face, eye, and body on Grimace dataset, Face94 Dataset, and Milborrow University of Cape Town (MUCT) dataset, respectively. Furthermore, it is also tested directly on the MATLAB software on a new image or Cam video. It takes less than 25 milliseconds per frame;hence, applicable for real-time applications.
This paper presents an incremental stereo algorithm designed to calculate a real-time disparity image. The algorithm is designed for stereo video sequences and uses previous information to reduce computation time and ...
详细信息
This paper presents an incremental stereo algorithm designed to calculate a real-time disparity image. The algorithm is designed for stereo video sequences and uses previous information to reduce computation time and improve disparity image quality. It is based on the semi-global matching stereo algorithm but modified to reuse previous calculation information. Storing and reusing this information not only reduces computation time but improves accuracy in a cost filtering scheme. Some tests are presented to compare the computation time and results of the algorithm, which show that it can achieve better results in terms of quality and time than standard algorithms for some scenarios.
The development of communication networks has made information security more important than ever for both transmission and storage. Since the majority of networks involve images, image security is becoming a difficult...
详细信息
The development of communication networks has made information security more important than ever for both transmission and storage. Since the majority of networks involve images, image security is becoming a difficult challenge. In order to provide real-timeimage encryption and decryption, this study suggests an FPGA implementation of a video cryptosystem that has been well -optimized based on high level synthesis. The MATLAB HDL coder and Vivado Tools from Xilinx are used in the design, implementation, and validation of the algorithm on the Xilinx Zynq FPGA platform. Low resource consumption and pipeline processing are well -suited to the hardware architecture. For real-time applications involving secret picture encryption and decryption, the suggested hardware approach is widely utilized. This study suggests an implementation of the encryption -decryption system that is both very efficient and areaoptimized. A unique high-level synthesis (HLS) design technique based on application -specific bit widths for intermediate data nodes was used to realize the proposed implementation. For HLS, MATLAB HDL coder was used to generate register transfer level RTL design. Using Vivado software, the RTL design was implemented on the Xilinx ZedBoard, and its functioning was tested in realtime using an input video stream. The results produced are faster and more are a efficient (target FPGA has fewer gates than before) than those of earlier solutions for the same target board.
The spatial and spectral information contained in the hyperspectral image (HSI) make it widely used in many fields. However, the sharp increase of HSI data brings enormous pressure to the data storage and real-time tr...
详细信息
The spatial and spectral information contained in the hyperspectral image (HSI) make it widely used in many fields. However, the sharp increase of HSI data brings enormous pressure to the data storage and real-time transmission. The research shows that hyperspectral compressive sensing (HCS) breaks through the bottleneck of the Nyquist sampling theorem, which can relieve the massive pressure on data storage and real-time transmission. Existing HCS methods try to design advanced compression sampling matrix or reconstruction algorithms, but cannot connect the two through a unified framework. To further improve the image reconstruction quality, a novel codec space-spectrum joint dense residual network (CDS2-DResN) is proposed. The CDS2-DResN is divided into block compression sampling part and reconstruction part. For block compression sampling, coded convolutional layer (CCL) is leveraged to compress and sample HSI. For measurements reconstruction, deconvolution layer is first leveraged to initially reconstruct HSI, and then build a space-spectrum joint network to refine the initial reconstructed HSI. Moreover, the CCL and reconstruction network are optimized via a unified framework, which can simplify the pre-processing and post-processing process of HCS. Extensive experiments have shown that CDS2-DResN has an excellent HCS reconstruction effect at measurement rates 0.25, 0.10, 0.04 and 0.01, respectively.
Online detection of action start is a significant and challenging task that requires prompt identification of action start positions and corresponding categories within streaming videos. This task presents challenges ...
详细信息
Online detection of action start is a significant and challenging task that requires prompt identification of action start positions and corresponding categories within streaming videos. This task presents challenges due to data imbalance, similarity in boundary content, and real-time detection requirements. Here, a novel time-Attentive Fusion Network is introduced to address the requirements of improved action detection accuracy and operational efficiency. The time-attentive fusion module is proposed, which consists of long-term memory attention and the fusion feature learning mechanism, to improve spatial-temporal feature learning. The temporal memory attention mechanism captures more effective temporal dependencies by employing weighted linear attention. The fusion feature learning mechanism facilitates the incorporation of current moment action information with historical data, thus enhancing the representation. The proposed method exhibits linear complexity and parallelism, enabling rapid training and inference speed. This method is evaluated on two challenging datasets: THUMOS'14 and ActivityNet v1.3. The experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art methods in terms of both detection accuracy and inference speed. Here, a novel time-Attentive Fusion Network (TAF-Net) is introduced to address the requirements of improved action detection accuracy and operational efficiency in the task of online detection of action start. The proposed model not only learns valuable sequence information for precise detection but its linear computational complexity and parallelism also contribute to a faster inference speed. image
video super-resolution(VSR) upscales a low-resolution video to the higher one. Most applications require compression of the super-resolved video due to limited internet bandwidth and storage capacity. However, most st...
详细信息
ISBN:
(纸本)9781728198354
video super-resolution(VSR) upscales a low-resolution video to the higher one. Most applications require compression of the super-resolved video due to limited internet bandwidth and storage capacity. However, most studies on VSR techniques have focused only on improving image quality, ignoring the impact of the compression process on visual quality. Consequently, even a VSR with good visual quality has a risk of significant loss of quality when serviced online or stored as a file. To address this problem, we propose an encoding-aware VSR framework. In the framework, we created a differentiable virtual codec to estimate the bit rate and used it for the loss function, which optimizes the super-resolved videos by considering the rate-distortion trade-off relationship and eventually leads to the prevention of visual quality degradation. According to the results, our real-time VSR model for x4 upscaling, trained with 1,191K parameters, yields a maximum gain of 13.2% over state-of-the-art VSR models based on the Bjontegaard delta rate.
Fast Fourier Transform (FFT) is widely used in image and videoprocessing applications to convert the respective image or video frames into transform domain that is very helpful to extract the accurate features of tha...
详细信息
Fast Fourier Transform (FFT) is widely used in image and videoprocessing applications to convert the respective image or video frames into transform domain that is very helpful to extract the accurate features of that image or video frame for various real-time applications. In this paper, efficient non-separable 8-point FFT architecture (DIT-FFT) is proposed that is implemented on Spartan-6 (xc6slx45-3csg324) FPGA. The proposed architecture consists of Data Format Conversion, Addition, Subtraction, Multiplier Equivalent and D-FF blocks, respectively. The non-separable equations of 8-point DIT-FFT are derived from the respective Butterfly Diagram that is then implemented using basic logic gates, which optimises the hardware utilisations with the help of Complex Conjugate property. The constant multiplications present in the non-separable DIT-FFT equations are implemented through Adders and Shifters presents in Multiplier Equivalent block which further optimises the overall hardware utilisations. Moreover, the Q-format are used to increase the data accuracy of the architecture. The comparison results show that the proposed architecture is better than existing in different prospectives.
Tracheal intubation is a critical medical procedure involving the insertion of a tube into the trachea to maintain an open airway. While essential, this procedure carries significant risks, such as incorrect tube plac...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Tracheal intubation is a critical medical procedure involving the insertion of a tube into the trachea to maintain an open airway. While essential, this procedure carries significant risks, such as incorrect tube placement. Advances in visually guided intubation methods, like video laryngoscopy, have enhanced safety by enabling precise tracheal segmentation from endoscopic images. Our study introduces an innovative image enhancement technique for video endoscopy that significantly improves tracheal visibility and segmentation accuracy. This novel approach not only facilitates safer and more accurate intubation but also minimizes patient discomfort and procedural risks. Tested against the UoS Dataset and real patient data from thyroidectomy procedures, our method demonstrated superior performance, achieving a segmentation accuracy of 97%, a precision of 94%, and a recall of 99%. Our tailored method is computationally efficient, making it suitable for implementation on edge devices like Arduino, thereby enhancing intubation safety and efficiency in various medical settings.
video analytics systems conduct video preprocessing to filter out unnecessary frames and model inference using appropriately selected neural networks for high analytics speed. video preprocessing is instruction-intens...
详细信息
video analytics systems conduct video preprocessing to filter out unnecessary frames and model inference using appropriately selected neural networks for high analytics speed. video preprocessing is instruction-intensive computing (IIC) executed by CPU, and model inference is data-intensive computing (DIC) executed by GPU. In this paper, we show the analytics accuracy of existing systems can largely vary in fields, caused by the dynamic IIC and DIC workloads of different contents in applications. Unfortunately, cameras have fixed CPU/GPU resources and cannot effectively adapt to workload dynamics. We develop Gemini, a new edge-side real-timevideo analytics system enhanced by a dual-image FPGA. We take the advantage of negligible image switching time of dual-image FPGAs, pre-configure one CPU image and one GPU image and elastically multiplex the dual CPU-GPU resources in time dimension. Gemini requires both hardware and software revisions. In hardware, we overcome challenges of hardware-dependent application development, low communication efficiency between the microprocessor and FPGA, and high programming complexity by hardware abstraction, asynchronous data transfer mechanism and stub-skeleton middleware. In software, we overcome the challenge of adapting to the dynamic workloads by a bandit learning approach. We implement Gemini and show that Gemini can improve the analytics accuracy to 90.35%.
This paper introduces a novel intra-prediction scheme for coding plenoptic video which can effectively exploit large correlation between current and neighboring macropixel images. While the intra block copy method is ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
This paper introduces a novel intra-prediction scheme for coding plenoptic video which can effectively exploit large correlation between current and neighboring macropixel images. While the intra block copy method is well recognized as a promising coding tool for plenoptic video, it has fundamental issues like much searching time for block vectors (BVs) and more bits to encode these BVs into a bitstream. Our method can effectively solve them by pre-defining the prediction candidates to save encoding time and signaling only the index of prediction location instead of BVs to reduce the overhead bits for encoding BVs. Compared to HEVC, our method is experimentally shown to achieve an average bitrate gain of about 19.70% and 11.99% respectively under the AI-Main and RA-Main conditions. Moreover, better trade- off can be made between complexity and coding performance than existing methods.
暂无评论