The detection of anomalies in video data is of great importance in various applications, such as surveillance and industrial monitoring. This paper introduces a novel approach, named MAAD-GAN, for video anomaly detect...
详细信息
ISBN:
(数字)9783031585357
ISBN:
(纸本)9783031585340;9783031585357
The detection of anomalies in video data is of great importance in various applications, such as surveillance and industrial monitoring. This paper introduces a novel approach, named MAAD-GAN, for video anomaly detection (VAD) utilizing Generative Adversarial Networks (GANs). The MAAD-GAN framework combines a Wide Residual Network (WRN) in the generator with a memory module to learn the normal patterns present in the training video dataset, enabling the generation of realistic samples. To address the challenge of detecting subtle anomalies and those with motion characteristics, we propose the integration of self-attention in the discriminator model. Our proposed model MAAD-GAN enhances the ability to distinguish between real and generated samples, ensuring that anomalous samples are distorted when reconstructed. Experimental evaluations show the effectiveness of MAAD-GAN as compared to traditional methods on UCSD (University of California, San Diego) Peds2, CUHK Avenue, and ShanghaiTech datasets.
image classifiers for domain-specific tasks like Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) and chest X-ray classification often rely on convolutional neural networks (CNNs). These networks, while...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
image classifiers for domain-specific tasks like Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) and chest X-ray classification often rely on convolutional neural networks (CNNs). These networks, while powerful, experience high latency due to the number of operations they perform, which can be problematic in real-time applications. Many image classification models are designed to work with both RGB and grayscale datasets, but classifiers that operate solely on grayscale images are less common. Grayscale image classification has critical applications in fields such as medical imaging and SAR ATR. In response, we present a novel grayscale image classification approach using a vectorized view of images. By leveraging the lightweight nature of Multi-Layer Perceptrons (MLPs), we treat images as vectors, simplifying the problem to grayscale image classification. Our approach incorporates a single graph convolutional layer in a batch-wise manner, enhancing accuracy and reducing performance variance. Additionally, we develop a customized accelerator on FPGA for our model, incorporating several optimizations to improve performance. Experimental results on benchmark grayscale image datasets demonstrate the effectiveness of our approach, achieving significantly lower latency (up to 16x less on MSTAR) and competitive or superior performance compared to state-of-the-art models for SAR ATR and medical image classification.
In various complex field environments, machine learning -based crop row detection faces challenges like rigidity and low adaptability. To address this issue, we integrated deep learning into agricultural analysis and ...
详细信息
In various complex field environments, machine learning -based crop row detection faces challenges like rigidity and low adaptability. To address this issue, we integrated deep learning into agricultural analysis and established a diverse dataset of corn fields across various scenarios. By employing an end -to -end CNN model and predicting row and column anchors, we created a grid -like understanding of images, significantly streamlining the crop row detection process without the need for pixel -level segmentation. This innovative approach offers a novel method for comprehending the spatial structure of crop rows. Furthermore, we extended the concept of agricultural machinery movement core areas to our data annotation strategy, eliminating the need for pre -selecting ROI regions during crop row extraction. Experimental results demonstrate that our Row and Column Anchor Selection Classification (RCASC) method surpasses conventional approaches in terms of versatility, achieving an F1 score of 92.6 %. It can autonomously extract agricultural machinery movement areas, with video stream processing frame rates exceeding 100FPS and an average imageprocessingtime of approximately 10 ms. This method not only meets the real-time requirements for corn crop row recognition but also operates effectively in various special scenarios, offering a feasible solution for further advancing agricultural automation and precision.
With the development of communication technology and Internet technology, the popularity of mobile terminals and intelligent devices, as well as emerging multimedia applications such as virtual reality video and short...
详细信息
Understanding speech production both visually and kinematically can inform second language learning system designs, as well as the creation of speaking characters in video games and animations. In this work, we introd...
详细信息
Due to technological advancements, numerous surveillance cameras has been installed in our everyday living spaces to enhance security measures. Assessing abnormalities within video recordings, particularly in crowded ...
详细信息
ISBN:
(纸本)9798331540661;9798331540678
Due to technological advancements, numerous surveillance cameras has been installed in our everyday living spaces to enhance security measures. Assessing abnormalities within video recordings, particularly in crowded environments, presents a formidable challenge. Anomalous occurrences, arising from infrequent and uncommon behaviours, are characterized by deviations in nearby spatiotemporal positions. To bolster public safety, surveillance cameras are frequently deployed in crowded areas such as hospitals, banks, and shopping districts. The proposed system combines You Only Look Once (YOLO) and 2D convolution layer (CONV2d) to efficiently detect unconventional human activities and abnormalities in real-timevideo footage. Employing computer vision and machine learning techniques, it scrutinizes video frames to identify potential threats or risks through the detection of abnormal behaviours. YOLO facilitates instantaneous object detection, while CONV2d effectively processes and analyses image data. By leveraging these technologies, the system is capable of monitoring and identifying human behaviour, thus enabling the real-time detection of abnormalities and potential threats. However, challenges persist regarding the placement of security cameras and the insufficient number of cameras compared to human monitors. Identifying abnormal events, such as crimes, illegal activities, and traffic accidents, remains a paramount duty in video surveillance and our proposed system strives to achieve improved accuracy in real-time event identification.
Infrared imaging technology is widely used in military and civilian fields, but in practical applications, accurate and effective detection and tracking of infrared small targets is a bottleneck problem that needs to ...
详细信息
Full-ring dual-modal ultrasound and photoacoustic imaging provide complementary contrasts, high spatial resolution, full view angle and are more desirable in pre-clinical and clinical applications. However, two long-s...
详细信息
Full-ring dual-modal ultrasound and photoacoustic imaging provide complementary contrasts, high spatial resolution, full view angle and are more desirable in pre-clinical and clinical applications. However, two long-standing challenges exist in achieving high-quality video-rate dual-modal imaging. One is the increased data processing burden from the dense acquisition. Another one is the object-dependent speed of sound variation, which may cause blurry, splitting artifacts, and low imaging contrast. Here, we develop a video-rate full-ring ultrasound and photoacoustic computed tomography (VF-USPACT) with real-time optimization of the speed of sound. We improve the imaging speed by selective and parallel image reconstruction. We determine the optimal sound speed via co-registered ultrasound imaging. Equipped with a 256-channel ultrasound array, the dual-modal system can optimize the sound speed and reconstruct dual-modal images at 10 Hz in real-time. The optimized sound speed can effectively enhance the imaging quality under various sample sizes, types, or physiological states. In animal and human imaging, the system shows co-registered dual contrasts, high spatial resolution (140 mu m), single-pulse photoacoustic imaging (< 50 mu s), deep penetration (> 20 mm), full view, and adaptive sound speed correction. We believe VF-USPACT can advance many real-time biomedical imaging applications, such as vascular disease diagnosing, cancer screening, or neuroimaging. (c) 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
Visible light communication (VLC) operates on the principle of modulating light-emitting diodes (LEDs) for data transmission at frequencies imperceptible to the human eye. In vehicular communication, VLC leverages exi...
详细信息
The non-contact heart rate detection system avoids direct contact between the sensor and the skin, improving portability, comfort and real-time heart rate monitoring. This paper presents an embedded-based non-contact ...
详细信息
暂无评论