检索结果-内蒙古大学图书馆

9th International conference on Multimedia and image processing (ICMIP)

作者： Quandt, Dennis Altmeyer, Philipp Ruppel, Wolfgang Narroschke, Matthias RheinMain Univ Appl Sci Wiesbaden Germany

ISBN: (纸本)9798400716164

News broadcasters must produce engaging video clips quicker than ever to ensure their successful positioning in the market. This is due, in part, to the growing number of news sources and changes in media consumption amongst target audiences. This evolution has amplified the need to quickly produce news clips, a requirement that remains at odds with the traditionally manual and time-consuming video editing processes. Besides advances in automating video news production, current systems are yet to meet the sufficient automation level and quality standards required for professional news broadcasting. Addressing this gap, we propose a novel transformer-based framework for automatically composing news clips to streamline the editing process. Our framework is predicated on a vision-language feature embedding mechanism and a cross-attention transformer architecture designed to generate multi-shot news clips semantically coherent with the editorial text and stylistically consistent with professional editing benchmarks. Our framework composes news clips with a length of 2 minutes from source material ranging from 20 minutes to 2 hours in less than 5 minutes using a single GPU. In our user study, target groups with different experience levels rated the generated videos on a 6-point Likert scale. Users rated the news clips generated by our framework with an average score of 4.13 and the manually edited news clips with an average score of 4.58.

关键词： News Clip Editing AI video Editing Computational Cinematography Text-based Clip Sequencing

来源：评论

学校读者我要写书评

暂无评论

FastLLVE: real-time Low-Light video Enhancement with Intensity-Aware Lookup Table 23

FastLLVE: Real-Time Low-Light Video Enhancement with Intensi...

引用

31st ACM International conference on Multimedia (MM)

作者： Li, Wenhao Wu, Guangyang Wang, Wenyi Ren, Peiran Liu, Xiaohong Shanghai Jiao Tong Univ Shanghai Peoples R China Univ Elect Sci & Technol China Chengdu Peoples R China Alibaba Damo Acad Hangzhou Peoples R China

ISBN: (纸本)9798400701085

Low-Light video Enhancement (LLVE) has received considerable attention in recent years. One of the critical requirements of LLVE is inter-frame brightness consistency, which is essential for maintaining the temporal coherence of the enhanced video. However, most existing single-image-based methods fail to address this issue, resulting in flickering effect that degrades the overall quality after enhancement. Moreover, 3D Convolution Neural Network (CNN)-based methods, which are designed for video to maintain inter-frame consistency, are computationally expensive, making them impractical for real-time applications. To address these issues, we propose an efficient pipeline named FastLLVE that leverages the Look-Up-Table (LUT) technique to maintain inter-frame brightness consistency effectively. Specifically, we design a learnable Intensity-Aware LUT (IA-LUT) module for adaptive enhancement, which addresses the low-dynamic problem in low-light scenarios. This enables FastLLVE to perform low-latency and low-complexity enhancement operations while maintaining high-quality results. Experimental results on benchmark datasets demonstrate that our method achieves the State-Of-The-Art (SOTA) performance in terms of both image quality and inter-frame brightness consistency. More importantly, our FastLLVE can process 1,080p videos at 50+ Frames Per Second (FPS), which is 2x faster than SOTA CNN-based methods in inference time, making it a promising solution for real-time applications. The code is available at https://***/Wenhao-Li777/FastLLVE.

关键词： Low-light video enhancement lookup table brightness consistency

来源：评论

学校读者我要写书评

暂无评论

COTTON-YOLO: Enhancing Cotton Boll Detection and Counting in Complex Environmental Conditions Using an Advanced YOLO Model

引用

APPLIED SCIENCES-BASEL 2024年第15期14卷 6650页

作者： Lu, Ziao Han, Bo Dong, Luan Zhang, Jingjing Xinjiang Agr Univ Coll Comp & Informat Engn Urumqi 830052 Peoples R China Minist Educ Engn Res Ctr Intelligent Agr Urumqi 830052 Peoples R China Xinjiang Agr Informatizat Engn Technol Res Ctr Urumqi 830052 Peoples R China

This study aims to enhance the detection accuracy and efficiency of cotton bolls in complex natural environments. Addressing the limitations of traditional methods, we developed an automated detection system based on computer vision, designed to optimize performance under variable lighting and weather conditions. We introduced COTTON-YOLO, an improved model based on YOLOv8n, incorporating specific algorithmic optimizations and data augmentation techniques. Key innovations include the C2F-CBAM module to boost feature recognition capabilities, the Gold-YOLO neck structure for enhanced information flow and feature integration, and the WIoU loss function to improve bounding box precision. These advancements significantly enhance the model's environmental adaptability and detection precision. Comparative experiments with the baseline YOLOv8 model demonstrated substantial performance improvements with COTTON-YOLO, particularly a 10.3% increase in the AP50 metric, validating its superiority in accuracy. Additionally, COTTON-YOLO showed efficient real-time processing capabilities and a low false detection rate in field tests. The model's performance in static and dynamic counting scenarios was assessed, showing high accuracy in static cotton boll counting and effective tracking of cotton bolls in video sequences using the ByteTrack algorithm, maintaining low false detections and ID switch rates even in complex backgrounds.

关键词： automated cotton detection YOLO (You Only Look Once) data augmentation techniques real-time image processing precision agriculture ByteTrack

来源：评论

学校读者我要写书评

暂无评论

Leveraging ShuffleNet and LLaVA-Phi for State-of-the-Art image Deblurring and Description for Mobile Devices

Leveraging ShuffleNet and LLaVA-Phi for State-of-the-Art Ima...

引用

2024 IEEE International conference on Signal processing and Advance Research in Computing, SPARC 2024

作者： Ranjan, Arti Ravinder, M. Indira Gandhi Delhi Technical University for Women Department of Computer Science and Engineering New Delhi India Sharda University Department of Computer Science and Engineering Greater Noida India

ISBN: (纸本)9798350385199

The growing demand for real-time image processing on edge devices calls for novel approaches that balance computational efficiency with high performance. This paper introduces an integrated solution combining ShuffleNet, a lightweight convolutional neural network, with LLaVA-Phi model for efficient image deblurring and descriptive analysis on mobile devices. ShuffleNet's structural efficiency, characterized by channel shuffling and depth-wise convolutions, is exploited to deblur images swiftly, while LLaVA-Phi interprets the imagery to generate concise natural language descriptions. Our unified approach significantly enhances both the visual clarity of images and the accuracy of their associated descriptions with minimal computational overhead. Experimental results reveal substantial improvements over existing methods, confirming the efficacy of our approach for enhanced real-time image processing in computationally limited environments. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

real-time Intelligent video Surveillance System using Recurrent Neural Network 2

Real-Time Intelligent Video Surveillance System using Recurr...

引用

2nd International conference on Machine Learning and Data Engineering, ICMLDE 2023

作者： Pooja, B.R. Rajkumar, N. Department of Computer Applications Krupanidhi College of Management Karnataka Bengaluru India Department of Computer Applications Harsha Institute of Management Studies Karnataka Bengaluru India

Security is a significant concern at all locations where CCTV cameras are installed. Security is a top priority;you must invest considerable time and effort to keep track of everything. Shortly, developments in computer vision may substantially impact video surveillance systems. To measure the video feed in real-time and detect any abnormal behavior without human intervention., such as violence or theft. The real-time video was captured as data with regular and strange events, and it has been trained (21 videos) and tested (16 videos) using a deep learning neural network. The captured video has been converted as frames using a Spatio-temporal autoencoder and 3D convolution network with the help of the RNN algorithm with a threshold value of 0.006 to detect the events. The RNN algorithm analyzed the captured video to see the possibilities with higher accuracy, 96%, than others. A data processing model for event detection will be constructed using deep-learning neural network technologies. A sophisticated monitoring system also transmits real-time video and voice communications to the web. © 2024 Elsevier B.V.. All rights reserved.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

Detection of early dangerous state in deep water of indoor swimming pool based on surveillance video

引用

SIGNAL image AND video processing 2022年第1期16卷 29-37页

作者： Wang, Fan Ai, Yibo Zhang, Weidong Univ Sci & Technol Beijing Natl Ctr Mat Serv Safety Beijing 100083 Peoples R China

This paper presents a method for early detection of dangerous condition in the deep-water zone of swimming pool based on video surveillance. We propose feature extraction, feature expression and assessment criteria, including a method for evaluating normal swimming speed based on the time series of swimmers, a method for assessing an upright state that is not limited by the camera angle, and the rules for assessing dangerous state. We have collected real-life data from the swimming pool and conducted related experiments. Our method can easily and efficiently detect the swimmer who is in danger at an early stage and provide necessary rescue reminders to lifeguards.

关键词： Surveillance video Dangerous state detection Swimming pool

来源：评论

学校读者我要写书评

暂无评论

Research on a Parallel Algorithm for video image Compression of Transmission Line Inspection

Research on a Parallel Algorithm for Video Image Compression...

引用

2023 International conference on Big Data Mining and Information processing, BDMIP 2023

作者： Hu, Meihui Li, Kai Wan, Jiao Chen, Tao Xiang, Zhiwei State Grid Xinjiang Information & Telecommunication Company Xinjiang Urumqi830017 China State Grid Xinjiang Electric Power Co. Ltd. Xinjiang Urumqi830017 China

ISBN: (纸本)9798400709166

Unmanned aerial vehicle (UAV) has the advantages of simple operation, sensitive response, flexible flight, long battery life and low cost, and has become a conventional way of power inspection. However, the video signal with huge data will bring a certain burden to the hardware of the data acquisition end of the system, so it is necessary to improve the sampling performance of the data acquisition end of the video compression system. In this paper, a parallel algorithm for video image compression of power transmission line inspection is proposed. By using the message passing interface function provided by MPI (message passing interface), the search and matching process of image domain block and value domain block is distributed to multiple processors for simultaneous execution. The experimental results show that when only one computing node is used, the CPU utilization efficiency is very close when the images with the same compression ratio are decompressed in two parallel modes. With the increase of the number of computing nodes, the efficiency of MPI parallel mode decreases gradually, while the efficiency of MPI+Open MP hybrid model increases. This study has certain reference value and practical value for real-time processing of transmission line inspection data. © 2023 ACM.

关键词： image compression

来源：评论

学校读者我要写书评

暂无评论

Unsupervised video Skimming with Adaptive Hierarchical Shot Detection 37

Unsupervised Video Skimming with Adaptive Hierarchical Shot ...

引用

37th SIBGRAPI conference on Graphics, Patterns and images (SIBGRAPI)

作者： Cardoso, Leonardo Vilela Werneck, July F. M. Guimaraes, Silvio Jamil E. Patrocinio, Zenilton K. G., Jr. Pontifical Catholic Univ Minas Gerais PUC Minas Lab Image & Multimedia Data Sci IMSci Belo Horizonte MG Brazil

ISBN: (纸本)9798350376043;9798350376036

video skimming involves generating a concise representation that captures all its significant information. However, conventional skimming techniques often fail to capture different shots in a video due to their inability to detect scene modifications and incorporate the hierarchical structure of video content. This work proposes an unsupervised hierarchical method for video skimming, called Hierarchical time-aware Skimming - HieTaSkim, in which video content is modeled as a graph, and an adaptive strategy is employed to produce hierarchical graph cuts. Those cuts are used to identify the most relevant video segments or keyshots, allowing the extraction of frames' sequences that convey the video's central message and resulting in a more effective and accurate video summary. Experimental results demonstrate that the proposed approach outperforms other state-of-the-art unsupervised methods for video skimming, achieving in the SumMe dataset an F-score of 39.9 which represents an improvement of 10% at least.

关键词： video analysis

来源：评论

学校读者我要写书评

暂无评论

real-time underwater video feed enhancement for Autonomous Underwater Vehicles (AUV)

Real-time underwater video feed enhancement for Autonomous U...

引用

conference on Multimodal image Exploitation and Learning

作者： Hasan, Yusuf Ali, Athar Aligarh Muslim Univ Zakir Hussain Coll Engn & Technol Dept Comp Engn Aligarh Uttar Pradesh India Univ Buckingham Sch Comp Buckingham England

ISBN: (纸本)9781510673854;9781510673847

In underwater exploration, Autonomous Underwater Vehicles (AUVs) face challenges due to the adverse effects of the aquatic environment on optical sensors, resulting in sub-optimal data acquisition. To overcome this, we propose a novel solution utilizing a Generative Adversarial Network (GAN) model. Rooted in the U-Net architecture, our model processes low-quality AUV camera feed, generating enhanced representations of the underwater scene. The discriminator focuses on evaluating current image patches, capturing high-frequency properties with fewer parameters, achieving a 15% improvement in model accuracy. This approach facilitates real-time preprocessing in visually-guided underwater robot autonomy pipelines, overcoming challenges associated with underwater visibility.

关键词： Autonomous Underwater Vehicles YOLO Generative Adversarial Networks

来源：评论

学校读者我要写书评

暂无评论

real-time adaptive skin detection using skin color model updating unit in videos

引用

JOURNAL OF real-time image processing 2022年第2期19卷 303-315页

作者： Zhang, Kun Wang, Yedong Li, Wenyuan Li, Changlu Lei, Zhichun Tianjin Univ Sch Microelect Tianjin 300072 Peoples R China Hisense Visual Technol Co Ltd Display R&D Dept Qingdao 266071 Peoples R China

Skin color plays an important role in color image processing and human-computer interaction. However, factors such as rapidly changing illumination, various color styles, and camera characteristics also make skin detection a challenging task. In particular, the real-time requirement of practical applications is a challenging task in skin detection. In this paper, face detection and alignment are applied to select facial reference points for modeling the skin color distribution. Moreover, we propose the conception and detection approach of skin color model updating unit (SCMUU) according to the fact of skin color distribution remains consistent in a range of frames. The redundant operation of frame by frame updating is avoided using one model in frames of SCMUU. When no reliable faces are detected, two strategies are introduced to remedy and reduce the computational cost. It uses the corresponding model parameters if a similar previous SCMUU is found. Otherwise, we use fixed thresholds instead and increase the interval between two consecutive face detection. Besides, the time-consuming steps are accelerated using a graphic processing unit (GPU) with CUDA in this paper. Experimental results show that, compared with other existing methods, the proposed method has good real time and accuracy for skin detection of various resolution videos under different illumination conditions.

关键词： Skin modeling Skin detection Adaptive thresholds Face detection video processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：