检索结果-内蒙古大学图书馆

MAAD-GAN: Memory-Augmented Attention-Based Discriminator GAN for video Anomaly Detection 1

8th International conference on Computer Vision and image processing (CVIP)

作者： Sethi, Anikeit Saini, Krishanu Singh, Rituraj Tiwari, Aruna Saurav, Sumeet Singh, Sanjay Chauhan, Vikas Indian Inst Technol Comp Sci & Engn Indore 452020 Madhya Pradesh India CSIR CEERI Intelligent Syst Grp Pilani 333031 Rajasthan India Natl Taipei Univ Technol Elect & Comp Sci Taipei 106 Taiwan

ISBN: (数字)9783031585357

ISBN: (纸本)9783031585340;9783031585357

The detection of anomalies in video data is of great importance in various applications, such as surveillance and industrial monitoring. This paper introduces a novel approach, named MAAD-GAN, for video anomaly detection (VAD) utilizing Generative Adversarial Networks (GANs). The MAAD-GAN framework combines a Wide Residual Network (WRN) in the generator with a memory module to learn the normal patterns present in the training video dataset, enabling the generation of realistic samples. To address the challenge of detecting subtle anomalies and those with motion characteristics, we propose the integration of self-attention in the discriminator model. Our proposed model MAAD-GAN enhances the ability to distinguish between real and generated samples, ensuring that anomalous samples are distorted when reconstructed. Experimental evaluations show the effectiveness of MAAD-GAN as compared to traditional methods on UCSD (University of California, San Diego) Peds2, CUHK Avenue, and ShanghaiTech datasets.

关键词： Anomaly Detection Generative Adversarial Networks Deep Learning Memory Network

来源：评论

学校读者我要写书评

暂无评论

A SINGLE GRAPH CONVOLUTION IS ALL YOU NEED: EFFICIENT GRAYSCALE image CLASSIFICATION 31

A SINGLE GRAPH CONVOLUTION IS ALL YOU NEED: EFFICIENT GRAYSC...

引用

2024 International conference on image processing

作者： Fein-Ashley, Jacob Wickramasinghe, Sachini Zhang, Bingyi Kannan, Rajgopal Prasanna, Viktor Univ Southern Calif Los Angeles CA 90007 USA DEVCOM Army Res Off Adelphi MD USA

ISBN: (纸本)9798350349405;9798350349399

image classifiers for domain-specific tasks like Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) and chest X-ray classification often rely on convolutional neural networks (CNNs). These networks, while powerful, experience high latency due to the number of operations they perform, which can be problematic in real-time applications. Many image classification models are designed to work with both RGB and grayscale datasets, but classifiers that operate solely on grayscale images are less common. Grayscale image classification has critical applications in fields such as medical imaging and SAR ATR. In response, we present a novel grayscale image classification approach using a vectorized view of images. By leveraging the lightweight nature of Multi-Layer Perceptrons (MLPs), we treat images as vectors, simplifying the problem to grayscale image classification. Our approach incorporates a single graph convolutional layer in a batch-wise manner, enhancing accuracy and reducing performance variance. Additionally, we develop a customized accelerator on FPGA for our model, incorporating several optimizations to improve performance. Experimental results on benchmark grayscale image datasets demonstrate the effectiveness of our approach, achieving significantly lower latency (up to 16x less on MSTAR) and competitive or superior performance compared to state-of-the-art models for SAR ATR and medical image classification.

关键词： GCN grayscale MLP low-latency

来源：评论

学校读者我要写书评

暂无评论

Efficient extraction of corn rows in diverse scenarios: A grid-based selection method for intelligent classification

引用

COMPUTERS AND ELECTRONICS IN AGRICULTURE 2024年 218卷

作者： Quan, Longzhe Guo, Zhiming Huang, Lili Xue, Yi Sun, Deng Chen, Tianbao Geng, Tianyu Shi, Jianze Hou, Pengbiao He, Jinbin Lou, Zhaoxia Anhui Agr Univ Sch Engn Hefei 230036 Peoples R China Northeast Agr Univ Coll Engn Harbin 150030 Peoples R China

In various complex field environments, machine learning -based crop row detection faces challenges like rigidity and low adaptability. To address this issue, we integrated deep learning into agricultural analysis and established a diverse dataset of corn fields across various scenarios. By employing an end -to -end CNN model and predicting row and column anchors, we created a grid -like understanding of images, significantly streamlining the crop row detection process without the need for pixel -level segmentation. This innovative approach offers a novel method for comprehending the spatial structure of crop rows. Furthermore, we extended the concept of agricultural machinery movement core areas to our data annotation strategy, eliminating the need for pre -selecting ROI regions during crop row extraction. Experimental results demonstrate that our Row and Column Anchor Selection Classification (RCASC) method surpasses conventional approaches in terms of versatility, achieving an F1 score of 92.6 %. It can autonomously extract agricultural machinery movement areas, with video stream processing frame rates exceeding 100FPS and an average image processing time of approximately 10 ms. This method not only meets the real-time requirements for corn crop row recognition but also operates effectively in various special scenarios, offering a feasible solution for further advancing agricultural automation and precision.

关键词： Visual navigation Deep Learning Maize crop row detection Early corn crops image classification

来源：评论

学校读者我要写书评

暂无评论

Research on frame prediction technology of video coding based on convolutional neural network 5

Research on frame prediction technology of video coding base...

引用

5th International conference on Computer Vision, image and Deep Learning, CVIDL 2024

作者： Zhang, Tianyu Zhang, Qiang Song, Ming Yao, Zhenfu Standards & Metrology Research Institute China Academy of Railway Sciences Corporation Limited Beijing China

ISBN: (纸本)9798350373820

With the development of communication technology and Internet technology, the popularity of mobile terminals and intelligent devices, as well as emerging multimedia applications such as virtual reality video and short video, which enrich people's daily life, video data is growing explosively. Although the current digital video coding standard HEVC can meet the compression performance requirements of high-definition and ultra-high definition digital videos, it cannot achieve good prediction results for complex texture image blocks or image blocks with weak directionality. In order to improve the accuracy of predictions in existing video coding standards, this paper proposes a prediction method based on convolutional neural networks. The experimental results show that the proposed prediction algorithm can achieve a 3.4% BD rate savings and a 0.29 dB BD-PSNR improvement. © 2024 IEEE.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

Speech2rtMRI: Speech-Guided Diffusion Model for real-time MRI video of the Vocal Tract during Speech

Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MR...

引用

2025 IEEE International conference on Acoustics, Speech, and Signal processing, ICASSP 2025

作者： Nguyen, Hong Foley, Sean Huang, Kevin Shi, Xuan Feng, Tiantian Narayanan, Shrikanth Signal Analysis and Interpretation Lab University of Southern California Los AngelesCA90089 United States

ISBN: (纸本)9798350368741

Understanding speech production both visually and kinematically can inform second language learning system designs, as well as the creation of speaking characters in video games and animations. In this work, we introduce a data-driven method to visually represent articulator motion in Magnetic Resonance Imaging (MRI) videos of the human vocal tract during speech based on arbitrary audio or speech input. We leverage large pre-trained speech models, which are embedded with prior knowledge, to generalize the visual domain to unseen data using an speech-to-video diffusion model. Our findings demonstrate that the visual generation significantly benefits from the pre-trained speech representations. We also observed that evaluating phonemes in isolation is challenging but becomes more straightforward when assessed within the context of spoken words. Limitations of the current results include the presence of unsmooth tongue motion and video distortion when the tongue contacts the palate. The source code is available for the public at: https://***/Hong7Cong/***. © 2025 IEEE.

关键词： inverse problems real-time MRI Speech production modeling Speech-guided video video Diffusion Model

来源：评论

学校读者我要写书评

暂无评论

Machine Learning based Abnormal Human Behaviour Detection 2

Machine Learning based Abnormal Human Behaviour Detection

引用

2nd International conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI)

作者： Marichamy, D. Sankar, M. Sivaprakash, P. Chithambaramane, R. Charaan, R. M. Dilip Ithayan, J. Vimala Vel Tech Rangarajan Dr Sagunthala R&D Inst Sci & Dept Comp Sci & Engn Chennai Tamil Nadu India

ISBN: (纸本)9798331540661;9798331540678

Due to technological advancements, numerous surveillance cameras has been installed in our everyday living spaces to enhance security measures. Assessing abnormalities within video recordings, particularly in crowded environments, presents a formidable challenge. Anomalous occurrences, arising from infrequent and uncommon behaviours, are characterized by deviations in nearby spatiotemporal positions. To bolster public safety, surveillance cameras are frequently deployed in crowded areas such as hospitals, banks, and shopping districts. The proposed system combines You Only Look Once (YOLO) and 2D convolution layer (CONV2d) to efficiently detect unconventional human activities and abnormalities in real-time video footage. Employing computer vision and machine learning techniques, it scrutinizes video frames to identify potential threats or risks through the detection of abnormal behaviours. YOLO facilitates instantaneous object detection, while CONV2d effectively processes and analyses image data. By leveraging these technologies, the system is capable of monitoring and identifying human behaviour, thus enabling the real-time detection of abnormalities and potential threats. However, challenges persist regarding the placement of security cameras and the insufficient number of cameras compared to human monitors. Identifying abnormal events, such as crimes, illegal activities, and traffic accidents, remains a paramount duty in video surveillance and our proposed system strives to achieve improved accuracy in real-time event identification.

关键词： YOLO Object detection CONV2d Abnormality detection Human behaviour detection video surveillance

来源：评论

学校读者我要写书评

暂无评论

Multi scene infrared image processing based on fusion algorithm 6

Multi scene infrared image processing based on fusion algori...

引用

6th conference on Frontiers in Optical Imaging and Technology: Imaging Detection and Target Recognition

作者： Wang, Shuwei Xi, Youyou Yang, Jinbao Tong, Xiaojie Yang, Chen Beijing Institute of Environmental Characteristics Beijing China 93114troops Beijing China

ISBN: (数字)9781510679733

ISBN: (纸本)9781510679726

Infrared imaging technology is widely used in military and civilian fields, but in practical applications, accurate and effective detection and tracking of infrared small targets is a bottleneck problem that needs to be solved urgently. In response to the problem that traditional algorithms are difficult to handle complex scenes with low signal-to-noise ratio and deep learning algorithms rely heavily on data, the proposed algorithm combines traditional algorithms with deep learning algorithms and is applied to detect and track infrared moving targets in various complex scenes, with resolutions ranging from 640 * 512 to 320 * 256 video sequences. At the same time, traditional algorithms include both single frame and multi frame detection methods. In order to avoid the problem of poor real-time performance, we selected the TMS320C6678 hardware platform and implemented simulation applications using a DSP+FPGA architecture. Experimental results have shown that this algorithm has excellent performance in object detection and tracking. © 2024 SPIE.

关键词： Learning algorithms

来源：评论

学校读者我要写书评

暂无评论

video-rate full-ring ultrasound and photoacoustic computed tomography with real-time sound speed optimization

引用

BIOMEDICAL OPTICS EXPRESS 2022年第8期13卷 4398-4413页

作者： Zhang, Yachao Wang, Lidai City Univ Hong Kong Dept Biomed Engn Hong Kong 999077 Peoples R China City Univ Hong Kong Shenzhen Res Inst Shenzhen 518057 Peoples R China

Full-ring dual-modal ultrasound and photoacoustic imaging provide complementary contrasts, high spatial resolution, full view angle and are more desirable in pre-clinical and clinical applications. However, two long-standing challenges exist in achieving high-quality video-rate dual-modal imaging. One is the increased data processing burden from the dense acquisition. Another one is the object-dependent speed of sound variation, which may cause blurry, splitting artifacts, and low imaging contrast. Here, we develop a video-rate full-ring ultrasound and photoacoustic computed tomography (VF-USPACT) with real-time optimization of the speed of sound. We improve the imaging speed by selective and parallel image reconstruction. We determine the optimal sound speed via co-registered ultrasound imaging. Equipped with a 256-channel ultrasound array, the dual-modal system can optimize the sound speed and reconstruct dual-modal images at 10 Hz in real-time. The optimized sound speed can effectively enhance the imaging quality under various sample sizes, types, or physiological states. In animal and human imaging, the system shows co-registered dual contrasts, high spatial resolution (140 mu m), single-pulse photoacoustic imaging (< 50 mu s), deep penetration (> 20 mm), full view, and adaptive sound speed correction. We believe VF-USPACT can advance many real-time biomedical imaging applications, such as vascular disease diagnosing, cancer screening, or neuroimaging. (c) 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

关键词： image metrics image quality image reconstruction Mode conversion Photoacoustic imaging Spatial resolution

来源：评论

学校读者我要写书评

暂无评论

Dual-Channel Visible Light Communication System for Enhanced V2V video Streaming 7

Dual-Channel Visible Light Communication System for Enhanced...

引用

7th International conference on Signal processing and Information Security, ICSPIS 2024

作者： Tettey, Daniel K. Elamassie, Mohammed Uysal, Murat Özyeǧin University Electrical & Electronics Engineering Istanbul Turkey Engineering Division Abu Dhabi United Arab Emirates

ISBN: (纸本)9798350368673

Visible light communication (VLC) operates on the principle of modulating light-emitting diodes (LEDs) for data transmission at frequencies imperceptible to the human eye. In vehicular communication, VLC leverages existing vehicle lighting infrastructure, such as headlights and taillights, to transmit data. This enables the sharing of real-time video feeds from onboard vehicle cameras with other vehicles, allowing drivers to see beyond the immediate traffic ahead. In this paper, we present an experimental study that demonstrates vehicle-to-vehicle (V2V) video streaming by using both headlights as wireless transmitters. This approach reduces the likelihood of signal degradation or interruptions caused by obstacles, movement, or changing road conditions. By leveraging multiple light sources, the system ensures a more stable and consistent data flow, improving overall performance and robustness in dynamic vehicular environments. Our experimental setup utilizes modified software-defined radio platforms for baseband processing and a custom-designed frontend featuring truck low-beam LED headlights. A real-time video streaming demonstration is conducted with our prototype to validate the feasibility of dual-channel VLC for vehicular connectivity using both headlights. © 2024 IEEE.

关键词： Radio communication

来源：评论

学校读者我要写书评

暂无评论

Embedded-Based Non-contact Heart Rate Detection System 8th

Embedded-Based Non-contact Heart Rate Detection System

引用

8th International conference on Computing, Control and Industrial Engineering, CCIE 2024

作者： Ding, Ning Zhang, Qi Liu, Zhirong Liang, Meiling Liu, Henghui Li, Linzhi School of Electronic Engineering and Automation Guilin University of Electronic Science and Technology Guilin China

ISBN: (纸本)9789819769360

The non-contact heart rate detection system avoids direct contact between the sensor and the skin, improving portability, comfort and real-time heart rate monitoring. This paper presents an embedded-based non-contact heart rate detection system, embedding the camera behind the mirror, so that the user can realize to look at the mirror and get their heart rate information at the same time. First, the face color video is acquired by the USB camera mounted on the Raspberry Pi, the face in the video is detected as well as the feature points are detected, and the region of interest of the face is obtained and intercepted based on the detected feature points. Then, using OpenCV-based image processing, the feature area is separated by RGB three-channel, and one group of components is used as a one-dimensional signal for data processing, and then the heart rate value with periodic changes is extracted by Fourier transform and band-pass filtering. The system finally realizes the measurement of heart rate value and the display of heart rate waveform. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：