News broadcasters must produce engaging video clips quicker than ever to ensure their successful positioning in the market. This is due, in part, to the growing number of news sources and changes in media consumption ...
详细信息
ISBN:
(纸本)9798400716164
News broadcasters must produce engaging video clips quicker than ever to ensure their successful positioning in the market. This is due, in part, to the growing number of news sources and changes in media consumption amongst target audiences. This evolution has amplified the need to quickly produce news clips, a requirement that remains at odds with the traditionally manual and time-consuming video editing processes. Besides advances in automating video news production, current systems are yet to meet the sufficient automation level and quality standards required for professional news broadcasting. Addressing this gap, we propose a novel transformer-based framework for automatically composing news clips to streamline the editing process. Our framework is predicated on a vision-language feature embedding mechanism and a cross-attention transformer architecture designed to generate multi-shot news clips semantically coherent with the editorial text and stylistically consistent with professional editing benchmarks. Our framework composes news clips with a length of 2 minutes from source material ranging from 20 minutes to 2 hours in less than 5 minutes using a single GPU. In our user study, target groups with different experience levels rated the generated videos on a 6-point Likert scale. Users rated the news clips generated by our framework with an average score of 4.13 and the manually edited news clips with an average score of 4.58.
Low-Light video Enhancement (LLVE) has received considerable attention in recent years. One of the critical requirements of LLVE is inter-frame brightness consistency, which is essential for maintaining the temporal c...
详细信息
ISBN:
(纸本)9798400701085
Low-Light video Enhancement (LLVE) has received considerable attention in recent years. One of the critical requirements of LLVE is inter-frame brightness consistency, which is essential for maintaining the temporal coherence of the enhanced video. However, most existing single-image-based methods fail to address this issue, resulting in flickering effect that degrades the overall quality after enhancement. Moreover, 3D Convolution Neural Network (CNN)-based methods, which are designed for video to maintain inter-frame consistency, are computationally expensive, making them impractical for real-time applications. To address these issues, we propose an efficient pipeline named FastLLVE that leverages the Look-Up-Table (LUT) technique to maintain inter-frame brightness consistency effectively. Specifically, we design a learnable Intensity-Aware LUT (IA-LUT) module for adaptive enhancement, which addresses the low-dynamic problem in low-light scenarios. This enables FastLLVE to perform low-latency and low-complexity enhancement operations while maintaining high-quality results. Experimental results on benchmark datasets demonstrate that our method achieves the State-Of-The-Art (SOTA) performance in terms of both image quality and inter-frame brightness consistency. More importantly, our FastLLVE can process 1,080p videos at 50+ Frames Per Second (FPS), which is 2x faster than SOTA CNN-based methods in inference time, making it a promising solution for real-time applications. The code is available at https://***/Wenhao-Li777/FastLLVE.
This study aims to enhance the detection accuracy and efficiency of cotton bolls in complex natural environments. Addressing the limitations of traditional methods, we developed an automated detection system based on ...
详细信息
This study aims to enhance the detection accuracy and efficiency of cotton bolls in complex natural environments. Addressing the limitations of traditional methods, we developed an automated detection system based on computer vision, designed to optimize performance under variable lighting and weather conditions. We introduced COTTON-YOLO, an improved model based on YOLOv8n, incorporating specific algorithmic optimizations and data augmentation techniques. Key innovations include the C2F-CBAM module to boost feature recognition capabilities, the Gold-YOLO neck structure for enhanced information flow and feature integration, and the WIoU loss function to improve bounding box precision. These advancements significantly enhance the model's environmental adaptability and detection precision. Comparative experiments with the baseline YOLOv8 model demonstrated substantial performance improvements with COTTON-YOLO, particularly a 10.3% increase in the AP50 metric, validating its superiority in accuracy. Additionally, COTTON-YOLO showed efficient real-timeprocessing capabilities and a low false detection rate in field tests. The model's performance in static and dynamic counting scenarios was assessed, showing high accuracy in static cotton boll counting and effective tracking of cotton bolls in video sequences using the ByteTrack algorithm, maintaining low false detections and ID switch rates even in complex backgrounds.
The growing demand for real-timeimageprocessing on edge devices calls for novel approaches that balance computational efficiency with high performance. This paper introduces an integrated solution combining ShuffleN...
详细信息
Security is a significant concern at all locations where CCTV cameras are installed. Security is a top priority;you must invest considerable time and effort to keep track of everything. Shortly, developments in comput...
详细信息
This paper presents a method for early detection of dangerous condition in the deep-water zone of swimming pool based on video surveillance. We propose feature extraction, feature expression and assessment criteria, i...
详细信息
This paper presents a method for early detection of dangerous condition in the deep-water zone of swimming pool based on video surveillance. We propose feature extraction, feature expression and assessment criteria, including a method for evaluating normal swimming speed based on the time series of swimmers, a method for assessing an upright state that is not limited by the camera angle, and the rules for assessing dangerous state. We have collected real-life data from the swimming pool and conducted related experiments. Our method can easily and efficiently detect the swimmer who is in danger at an early stage and provide necessary rescue reminders to lifeguards.
Unmanned aerial vehicle (UAV) has the advantages of simple operation, sensitive response, flexible flight, long battery life and low cost, and has become a conventional way of power inspection. However, the video sign...
详细信息
video skimming involves generating a concise representation that captures all its significant information. However, conventional skimming techniques often fail to capture different shots in a video due to their inabil...
详细信息
ISBN:
(纸本)9798350376043;9798350376036
video skimming involves generating a concise representation that captures all its significant information. However, conventional skimming techniques often fail to capture different shots in a video due to their inability to detect scene modifications and incorporate the hierarchical structure of video content. This work proposes an unsupervised hierarchical method for video skimming, called Hierarchical time-aware Skimming - HieTaSkim, in which video content is modeled as a graph, and an adaptive strategy is employed to produce hierarchical graph cuts. Those cuts are used to identify the most relevant video segments or keyshots, allowing the extraction of frames' sequences that convey the video's central message and resulting in a more effective and accurate video summary. Experimental results demonstrate that the proposed approach outperforms other state-of-the-art unsupervised methods for video skimming, achieving in the SumMe dataset an F-score of 39.9 which represents an improvement of 10% at least.
In underwater exploration, Autonomous Underwater Vehicles (AUVs) face challenges due to the adverse effects of the aquatic environment on optical sensors, resulting in sub-optimal data acquisition. To overcome this, w...
详细信息
ISBN:
(纸本)9781510673854;9781510673847
In underwater exploration, Autonomous Underwater Vehicles (AUVs) face challenges due to the adverse effects of the aquatic environment on optical sensors, resulting in sub-optimal data acquisition. To overcome this, we propose a novel solution utilizing a Generative Adversarial Network (GAN) model. Rooted in the U-Net architecture, our model processes low-quality AUV camera feed, generating enhanced representations of the underwater scene. The discriminator focuses on evaluating current image patches, capturing high-frequency properties with fewer parameters, achieving a 15% improvement in model accuracy. This approach facilitates real-time preprocessing in visually-guided underwater robot autonomy pipelines, overcoming challenges associated with underwater visibility.
Skin color plays an important role in color imageprocessing and human-computer interaction. However, factors such as rapidly changing illumination, various color styles, and camera characteristics also make skin dete...
详细信息
Skin color plays an important role in color imageprocessing and human-computer interaction. However, factors such as rapidly changing illumination, various color styles, and camera characteristics also make skin detection a challenging task. In particular, the real-time requirement of practical applications is a challenging task in skin detection. In this paper, face detection and alignment are applied to select facial reference points for modeling the skin color distribution. Moreover, we propose the conception and detection approach of skin color model updating unit (SCMUU) according to the fact of skin color distribution remains consistent in a range of frames. The redundant operation of frame by frame updating is avoided using one model in frames of SCMUU. When no reliable faces are detected, two strategies are introduced to remedy and reduce the computational cost. It uses the corresponding model parameters if a similar previous SCMUU is found. Otherwise, we use fixed thresholds instead and increase the interval between two consecutive face detection. Besides, the time-consuming steps are accelerated using a graphic processing unit (GPU) with CUDA in this paper. Experimental results show that, compared with other existing methods, the proposed method has good realtime and accuracy for skin detection of various resolution videos under different illumination conditions.
暂无评论