检索结果-内蒙古大学图书馆

Dynamic error-bounded lossy compression to reduce the bandwidth requirement for real-time vision-based pedestrian safety applications

引用

JOURNAL OF real-time image processing 2022年第1期19卷 117-131页

作者： Rahman, Mizanur Islam, Mhafuzul Holt, Cavender Calhoun, Jon Chowdhury, Mashrur Univ Alabama Dept Civil Construct & Environm Engn Tuscaloosa AL 35487 USA Clemson Univ Glenn Dept Civil Engn Clemson SC USA Clemson Univ Holcombe Dept Elect & Comp Engn Clemson SC USA

As camera quality improves and their deployment moves to areas with limited bandwidth, communication bottlenecks can impair real-time constraints of an intelligent transportation systems application, such as video-based real-time pedestrian detection. video compression reduces the bandwidth requirement to transmit the video which degrades the video quality. As the quality level of the video decreases, it results in the corresponding decreases in the accuracy of the vision-based pedestrian detection model. Furthermore, environmental conditions, such as rain and night-time darkness impact the ability to leverage compression by making it more difficult to maintain high pedestrian detection accuracy. The objective of this study is to develop a real-time error-bounded lossy compression (EBLC) strategy to dynamically change the video compression level depending on different environmental conditions to maintain a high pedestrian detection accuracy. We conduct a case study to show the efficacy of our dynamic EBLC strategy for real-time vision-based pedestrian detection under adverse environmental conditions. Our strategy dynamically selects the lossy compression error tolerances that maintain a high detection accuracy across a representative set of environmental conditions. Analyses reveal that for adverse environmental conditions, our dynamic EBLC strategy increases pedestrian detection accuracy up to 14% and reduces the communication bandwidth up to 14 x compared to the state-of-the-practice. Moreover, we show our dynamic EBLC strategy is independent of pedestrian detection models and environmental conditions allowing other detection models and environmental conditions to be easily incorporated.

关键词： Error-bounded lossy compression (EBLC) Efficient bandwidth usage real-time processing Vision-based object detection Pedestrian detection

来源：评论

学校读者我要写书评

暂无评论

real-time Refocusing of Spaceborne SAR images for Ship Targets with Space-variant Motion Based on GPU 2

Real-time Refocusing of Spaceborne SAR Images for Ship Targe...

引用

2nd IEEE International conference on Signal, Information and Data processing, ICSIDP 2024

作者： Su, Yuchen Yang, Wei Guo, Jiayi Wang, Yamin Beihang University School of Electronic and Information Engineering Beijing China

ISBN: (纸本)9798331515669

Traditional spaceborne synthetic aperture radar (SAR) imaging algorithms are primarily designed for stationary targets. However, in practical scenarios, ship targets are often affected by complex angular motions induced by sea winds and waves, resulting in significant blurring in SAR images. This angular motion is inherently nonlinear and space-variant. This paper proposes a refocusing method tailored for ship targets with complex space-variant motion. The method decomposes the ship target image into multiple subimages and estimates phase errors of each subimage by maximizing image contrast, thereby achieving fine refocusing. Additionally, an acceleration scheme for refocusing processing is designed based on spaceborne hardware platforms, aimed at onboard real-time processing. The effectiveness of both the refocusing method and the acceleration scheme is validated using real spaceborne SAR image data. Experiment results demonstrate that, compared to traditional refocusing methods, the proposed approach improves the imaging quality of ship targets with complex motion. The refocusing processing efficiency is greatly enhanced, meeting the real-time processing requirements. © 2024 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Design and Development of real-time image Based Fire Detection Using OpenCV and HSV

Design and Development of Real-Time Image Based Fire Detecti...

引用

2024 International conference on Advances in Computing Research on Science Engineering and Technology, ACROSET 2024

作者： Bothraa, Raajjhesh Ahire, Darsh Bhadange, Rutuja Temkar, Rutvij Mane, Mayuri Jamdhade, Shweta University of Mumbai Departement of Information Technology Mumbai India

ISBN: (纸本)9798350388800

Nowadays fire detection and recognition are one of the major securities and security region for saving human life. Concern to this we are going to propose a new model for early detection and recognition of the fire for the purpose to provide the better services. In this proposed model we are going to use The OpenCV and HSV driven Fire Detection initiative is aimed at recognizing fires by utilizing image processing technology to give early alerts to people. This proposed system is launched to overcome limitations like enhance efficiency and instant alert to avoid disastrous situation through. The existing automated fire alarm systems like the sensor method have limitations in detecting smoke-caused fires in restricted areas and late alerts. By employing a central processing unit and connecting a webcam as hardware, video footage from the surroundings is captured in the form of image as an input source. The components of theory emphasize computer vision, deep learning, image processing, color models, and the algorithm operation for detecting fires. This endeavor is designed to offer a comprehensive understanding of object detection using computers and the flexibility of these technologies in various applications. The utilization of OpenCV serves the purpose of computer vision in this system. The complete code is written in Python language utilizing the open CV library. © 2024 IEEE.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

OmniTokenizer: A Joint image-video Tokenizer for Visual Generation 38

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Gene...

引用

38th conference on Neural Information processing Systems, NeurIPS 2024

作者： Wang, Junke Jiang, Yi Yuan, Zehuan Peng, Binyue Wu, Zuxuan Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Bytedance Inc. China

Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. OmniTokenizer is designed with a spatial-temporal decoupled architecture, which integrates window and causal attention for spatial and temporal modeling. To exploit the complementary nature of image and video data, we further propose a progressive training strategy, where OmniTokenizer is first trained on image data on a fixed resolution to develop the spatial encoding capacity and then jointly trained on image and video data on multiple resolutions to learn the temporal dynamics. OmniTokenizer, for the first time, handles both image and video inputs within a unified framework and proves the possibility of realizing their synergy. Extensive experiments demonstrate that OmniTokenizer achieves state-of-the-art (SOTA) reconstruction performance on various image and video datasets, e.g., 1.11 reconstruction FID on imageNet and 42 reconstruction FVD on UCF-101, beating the previous SOTA methods by 13% and 26%, respectively. Additionally, we also show that when integrated with OmniTokenizer, both language model-based approaches and diffusion models can realize advanced visual synthesis performance, underscoring the superiority and versatility of our method. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

GPU-based parallelisation of a versatile video coding adaptive loop filter in resource-constrained heterogeneous embedded platform

引用

JOURNAL OF real-time image processing 2023年第3期20卷 1-13页

作者： Saha, Anup Roma, Nuno Chavarrias, Miguel Dias, Tiago Pescador, Fernando Aranda, Victor Univ Politecn Madrid Madrid Spain Univ Lisbon Inst Super Tecn INESC ID Lisbon Portugal Inst Politecn Lisboa INESC ID ISEL Lisbon Portugal

This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilised to maximise the degree of parallelism, making the programme capable of exploiting the GPU capabilities. The proposed approach enables to accelerate the ALF computation by an average of two times when compared to an already fully optimised version of the software decoder implementation over an embedded platform. Finally, this work presents an analysis of energy consumption, showing that the proposed methodology has a negligible impact on this key parameter.

关键词： GPU VVC video decoder Embedded ALF Filter

来源：评论

学校读者我要写书评

暂无评论

Attention Based Quick Network With Optical Flow Estimation for Semantic Segmentation

引用

IEEE ACCESS 2023年 11卷 12402-12413页

作者： Cai, Jiawen Liu, Yarong Qin, Pan Dalian Univ Technol Sch Control Sci & Engn Dalian 116014 Liaoning Peoples R China

video semantic segmentation is a challenging vision task since the temporal-spatial characteristics are difficult to model to satisfy the requirements of real-time and accuracy simultaneously. To tackle this problem, this paper proposes a novel optical flow based method. We propose an adaptive threshold key frame scheduling strategy to model the temporal information by estimating the inter-frame similarity. To ensure segmentation accuracy, we construct a convolutional neural network named Quick Network with attention (QNet-attention), a lightweight image semantic segmentation model with a spatial-pyramid-pooling-attention module. The proposed network is further combined with optical flow estimation to realize a semantic segmentation framework. The performance of the proposed method is verified with existing benchmark methods. The experimental results indicated that our method achieves excellent balanced performance on accuracy and speed.

关键词： Semantic segmentation Optical imaging Optical network units Feature extraction Optical propagation Deep learning Adaptation models video recording deep learning video processing

来源：评论

学校读者我要写书评

暂无评论

Robust near-duplicate video retrieval using compressed representation 20

Robust near-duplicate video retrieval using compressed repre...

引用

20th IEEE International conference on Advanced video and Signal-Based Surveillance (AVSS)

作者： Progonov, Dmytro Mandrenko, Anna Kuzmenko, Dmytro Kuznetsov, Vasyl Derkach, Viacheslav Morgunov, Mykhailo Igor Sikorsky Kyiv Polytech Inst Samsung R&D Inst Ukraine 101 Tower UA-01032 Kiev Ukraine Samsung R&D Inst Ukraine 101 Tower UA-01032 Kiev Ukraine

ISBN: (纸本)9798350374292;9798350374285

Recently the amount of videos produced by mobile devices has grown, as well as the variety of video analytics services, able to perform on-device classification, automated tagging, video retrieval, object tracking and similarity analysis. However, high computational complexity limits their use in real-time applications (such as, objects detection in Closed Circuit Television (CCTV) system), may introduce poor user experience because of delays or may have privacy nuances because of video content sharing with edge computing services. Because of that, of special interest are approaches that use advanced hardware Artificial Intelligence (AI) features of modern mobile platforms, such as Qualcomm AI Engine, Samsung's AI Quantum Processor, MediaTek's AI processing Units etc. Online services for video search by an (optionally distorted) video fragment do exist, but they are very limited and bound to cloud-side processing. This inspired us to develop a core technology for near-duplicate video retrieval that would allow such a video search service to operate efficiently, be applicable for on-device execution and enable many other practical applications (e.g. video provenance, authenticity check or recommendation service). We propose an adaptation of modern Visual Transformer (VT) model for processing partially decompressed video stream and matching a small sets of reference frames (keyframes) with a pristine reference video. The adaptation of novel DINO model [3] in our proposed approach allows to improve the detection of near-duplicates by 20% (from 66.1% to 88.3%) in comparison with the state-of-the-art DnS [13] and S2VS [12] models. The proposed models are robust against popular video content modifications, such as affine transformations and visual effects, even with video transcoding by novel Essential video Coding (EVC) codec. Also, our proposed solution makes it possible to shorten processing duration down to 2.5 times in comparison with the approach the requires full v

关键词： Affine transforms

来源：评论

学校读者我要写书评

暂无评论

Rosette Plant Centre Detection and Tracking using YOLO: An Efficient Deep Learning Approach 3

Rosette Plant Centre Detection and Tracking using YOLO: An E...

引用

3rd International conference on Computing and Machine Intelligence (ICMI)

作者： Akagic, Amila Saric, Rijad Buza, Emir Kecman, Stefani Lewsey, Mathew G. Custovic, Edhem Whelan, James Univ Sarajevo UNSA Fac Elect Engn Sarajevo 71000 Bosnia & Herceg La Trobe Inst Sustainable Agr & Food LISAF Dept Anim Plant & Soil Sci Melbourne Vic 3086 Australia La Trobe Univ Australian Res Council Res Hub Med Agr Melbourne Vic 3086 Australia Sci Instruments Australia SIA 2 Res Ave Melbourne Vic 3086 Australia Zhejiang Univ Coll Life Sci State Key Lab Plant Environm Resilience Hangzhou 310058 Peoples R China Zhejiang Univ Prov Int Sci & Technol Cooperat Base Engn Biol Haining 314400 Peoples R China

ISBN: (纸本)9798350372977;9798350372984

The precise detection of plant centres is important for growth monitoring, enabling the continuous tracking of plant development to discern the influence of diverse factors. It holds significance for automated systems like robotic harvesting, facilitating machines in locating and engaging with plants. In this paper, we explore the YOLOv4 (You Only Look Once) real-time neural network detector for plant centre detection. Our dataset, comprising over 12,000 images from 151 Arabidopsis thaliana accessions, is used to fine-tune the model. Evaluation of the dataset reveals the model's proficiency in centre detection across various accessions, boasting an mAP of 99.79% at a 50% IoU threshold. The model demonstrates real-time processing capabilities, achieving a frame rate of approximately 50 FPS. This outcome underscores its rapid and efficient analysis of video or image data, showcasing practical utility in time-sensitive applications.

关键词： Plant Phenotyping Arabidopsis thaliana Computer Vision image processing Deep Learning Neural Networks

来源：评论

学校读者我要写书评

暂无评论

Improving the Effectiveness of Deep Generative Data

Improving the Effectiveness of Deep Generative Data

引用

IEEE/CVF Winter conference on Applications of Computer Vision (WACV)

作者： Wang, Ruyu Schmedding, Sabrina Huber, Marco F. Bosch Ctr Artificial Intelligence Renningen Germany Univ Stuttgart Inst Ind Mfg & Management IFF Stuttgart Germany Fraunhofer Inst Mfg Engn & Automat IPA Stuttgart Germany

ISBN: (纸本)9798350318920;9798350318937

Recent deep generative models (DGMs) such as generative adversarial networks (GANs) and diffusion probabilistic models (DPMs) have shown their impressive ability in generating high-fidelity photorealistic images. Although looking appealing to human eyes, training a model on purely synthetic images for downstream image processing tasks like image classification often results in an undesired performance drop compared to training on real data. Previous works have demonstrated that enhancing a real dataset with synthetic images from DGMs can be beneficial. However, the improvements were subjected to certain circumstances and yet were not comparable to adding the same number of real images. In this work, we propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset. We hypothesize that the Content Gap accounts for a large portion of the performance drop when using synthetic images from DGM and propose strategies to better utilize them in downstream tasks. Extensive experiments on multiple datasets showcase that our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-real) and training on a mix of real and synthetic data (Data Augmentation), particularly in the data-scarce scenario.

关键词： 3D Algorithms Algorithms Algorithms Datasets and evaluations etc. Generative models for image image recognition and understanding video

来源：评论

学校读者我要写书评

暂无评论

ZE-FESG: A ZERO-SHOT FEATURE EXTRACTION METHOD BASED ON SEMANTIC GUIDANCE FOR NO-REFERENCE video QUALITY ASSESSMENT 49

ZE-FESG: A ZERO-SHOT FEATURE EXTRACTION METHOD BASED ON SEMA...

引用

49th IEEE International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Mi, Yachun Li, Yu Shu, Yan Liu, Shaohui Harbin Inst Technol Harbin Peoples R China

ISBN: (纸本)9798350344868;9798350344851

Although the current deep neural network based no-reference video quality assessment (NR-VQA) methods can effectively simulate the human visual system (HVS), their interpretability is getting worse. The current methods only extract the low-level features of space and time of the video and do not consider the impact of high-level semantics. However, the high-level semantic information in the video related to human subjective perception and related to its own quality can be perceived by the HVS. In this work, we design the multidimensional feature extractor (MDFE), which takes the text descriptions related to video quality factors as semantic guidance, and uses the Contrastive Language-image Pre-training (CLIP) model to perform zero-shot multidimensional feature extraction. Then, we further propose a zero-shot feature extraction method based on semantic guidance (ZE-FESG), which treats the MDFE as a feature extractor and acquires all the semantically corresponding features of the video by sliding over each frame of the video. Extensive experiments show that the proposed ZE-FESG has better interpretability and performance than the current mainstream 2D-CNN based feature extraction methods for NR-VQA. The code will be released on https://***/xiao-mi-d/ZE-FESG.

关键词： video Quality Assessment Semantic Guidance Zero-shot Multidimensional Feature Extractor

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：