检索结果-内蒙古大学图书馆

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust video Object Segmentation

引用

IEEE TRANSACTIONS ON image processing 2024年 33卷 4853-4866页

作者： Dang, Jisheng Zheng, Huicheng Xu, Xiaohao Wang, Longguang Guo, Yulan Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou 510006 Peoples R China Minist Educ Key Lab Machine Intelligence & Adv Comp Guangzhou 510006 Peoples R China Guangdong Prov Key Lab Informat Secur Technol Guangzhou 510006 Peoples R China Univ Michigan Robot Inst Ann Arbor MI 48109 USA Air Force Aviat Univ Sch Elect Sci & Technol Changchun 130022 Peoples R China Sun Yat Sen Univ Sch Elect & Commun Engn Shenzhen Campus Shenzhen 510006 Peoples R China

Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object's appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases. In this paper, we introduce a multi-frame spatio-temporal context memory (STCM) network to exploit discriminative spatio-temporal cues in multiple adjacent frames by utilizing a multi-frame context interaction module (MCI) for memory construction. Based on the proposed MCI module, a sparse group memory reader is developed to enable efficient sparse matching during memory reading. Our proposed method is generic and achieves state-of-the-art performance with real-time speed on benchmark datasets such as DAVIS and YouTube-VOS. In addition, our model exhibits robustness to sparse videos with low frame rates.

关键词： Memory management Feature extraction Object segmentation Accuracy Task analysis Optical flow Cognition video object segmentation spatio-temporal memory multi-frame context interaction sparse memory mechanism

来源：评论

学校读者我要写书评

暂无评论

Dense video event semantic annotation based on multimodal features 16

Dense video event semantic annotation based on multimodal fe...

引用

2024 16th International conference on Graphics and image processing, ICGIP 2024

作者： Lu, Hequn Du, Zhenlong Nanjing Tech University China

ISBN: (数字)9781510688780

ISBN: (纸本)9781510688773

As a model of cross-media intelligence that combines computer vision and natural language processing, video semantic annotation facilitates automatic location of events in videos and describes video content in natural language. Unlike standard video annotation, dense video annotation requires the detection and description of multiple events in long videos, adding additional complexity to locate events in long videos. By proposing a dense video semantic annotation method based on deep learning, a single discrete tag sequence can be predicted for a given multimodal input, which includes a title tag for the event and a time tag representing the timestamp of the event. The proposed model uses unlabeled narrative videos for pre-training, and uses transcribed speech and corresponding timestamps as a weakly supervised source of dense video annotation to replace manual annotation information, which expands the size of available datasets. Furthermore, by fine-tuning the model, we can apply it to the problem of paragraph annotation, generating paragraph descriptions about the entire video. The results show that the proposed model can predict high-quality event descriptions and relatively accurate time boundaries in different scenarios. © 2025 SPIE.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

VVC adaptive QP offset algorithm based on visual perception 3

VVC adaptive QP offset algorithm based on visual perception

引用

3rd International conference on Signal image processing and Communication, ICSIPC 2023

作者： Jin, Siyu Guan, XiaoHan Liu, Zhi North China University of Technology Beijing China

ISBN: (纸本)9781510670945

The latest video coding standard, the Universal video Coding Standard (VVC), uses new coding tools to greatly improve compression efficiency. However, the adaptive QP module in the coding framework ignores the characteristics of the human visual system HVS, resulting in coding results that differ from the real perception of the human eye. To incorporate visual perception into QP selection, this paper proposes an adaptive QP offset algorithm based on visual perception metrics. The first step is to design a QP offset algorithm based on the luminance, frequency, and time domain characteristics of the HVS. The visual perception index is designed to reflect the real perception of human eyes, and then the index is used to guide the CTU into different levels according to visual sensitivity. Experiments show that the algorithm achieves 0.74% performance improvement in VMAF - based BD-Rate and about 8% improvement in VMAF score compared to the original VVC. © 2023 SPIE.

关键词： video signal processing

来源：评论

学校读者我要写书评

暂无评论

SABA: Scale-adaptive Attention and Boundary Aware Network for real-time semantic segmentation

引用

EXPERT SYSTEMS WITH APPLICATIONS 2025年 282卷

作者： Luo, Huilan Liu, Chunyan Shark, Lik-Kwan Jiangxi Univ Sci & Technol Sch Informat Engn Ganzhou 341000 Peoples R China Univ Cent Lancashire Appl Digital Signal & Image Proc Res Ctr Sch Engn Preston England

Balancing accuracy and speed is crucial for semantic segmentation in autonomous driving. While various mechanisms have been explored to enhance segmentation accuracy in lightweight deep learning networks, adding more mechanisms does not always lead to better performance and often significantly increases processing time. This paper investigates a more effective and efficient integration of three key mechanisms - context, attention, and boundary - to improve real-time semantic segmentation of road scene images. Based on an analysis of recent fully convolutional encoder-decoder networks, we propose a novel Scale-adaptive Attention and Boundary Aware (SABA) segmentation network. SABA enhances context through a new pyramid structure with multi-scale residual learning, refines attention via scale-adaptive spatial relationships, and improves boundary delineation using progressive refinement with a dedicated loss function and learnable weights. Evaluations on the Cityscapes benchmark show that SABA outperforms current real-time semantic segmentation networks, achieving a mean intersection over union (mIoU) of up to 76.7% and improving accuracy for 17 out of 19 object classes. Moreover, it achieves this accuracy at an inference speed of up to 83.4 frames per second, significantly exceeding real-time video frame rates. The code is available at https://***/liuchunyan66/SABA.

关键词： real-time semantic segmentation Context enhancement Multi-scale adaptive attention Boundary awareness

来源：评论

学校读者我要写书评

暂无评论

Guided Linear Upsampling

引用

ACM TRANSACTIONS ON GRAPHICS 2023年第4期42卷 1-12页

作者： Song, Shuangbing Zhong, Fan Wang, Tianju Qin, Xueying Tu, Changhe Shandong Univ Jinan Peoples R China

Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in order to prevent missing small isolated regions. Our method can be derived from the color line model and local color transformations. Compared to previous methods, our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring. It is efficient, easy to implement, and free of sensitive parameters. We evaluate the proposed method with a wide range of image operators, and show its advantages through quantitative and qualitative analysis. We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution video processing. In particular, for interactive editing, the joint optimization can be precomputed, thus allowing for instant feedback without hardware acceleration.

关键词： guided upsampling optimized downsampling image processing

来源：评论

学校读者我要写书评

暂无评论

Adaptive 360° video Streaming Over Wireless Communication Channels 9

Adaptive 360° Video Streaming Over Wireless Communication C...

引用

9th International conference on Frontiers of Signal processing, ICFSP 2024

作者： Valiandi, Ioanna Pattichis, Marios S. Kyriacou, Efthyvoulos Panayides, Andreas S. VIDEOMICS Group CYENS Centre of Excellence Nicosia Cyprus University of New Mexico Department of Electrical and Computer Engineering AlbuquerqueNM United States Cyprus University of Technology Department of Electrical Eng. Computer Engineering and Informatics Limassol Cyprus

ISBN: (纸本)9798350353235

360° video streaming is one of the prevalent communication technologies for enhancing user experience and has thus seen widespread adoption in virtual and mixed reality applications. However, delivering content at scale while securing the quality of wirelessly communicated 360° videos in real-time poses significant challenges. 360° videos come in ultra-high definition, necessitate unprecedented bitrate demands and involve high encoding complexity. The time-varying nature of underlying wireless channels further introduces a destabilizing factor, calling for video systems to seamlessly adjust to varying bandwidth throughput to maintain adequate quality of service and experience. To address this issue, in this study, we have developed a multi-objective optimization framework for real-time video encoding adaptation. The objective is to optimize both video quality and encoding efficiency while minimizing the required bitrate, subject to real-time application constraints. To achieve this, we relied on generating (offline) precise forward prediction models of video quality, bitrate demands, and encoding time, that can be used to select the optimum encoding configuration in real-time. To validate our methods, we implemented an adaptive video encoding controller, and ran emulations employing actual network traces from 5G mobile video streaming scenarios, using the popular open-source x264 and x265 codecs for video encoding. A dataset of 4K omnidirectional videos at 30 frames per second was used. © 2024 IEEE.

关键词： video streaming

来源：评论

学校读者我要写书评

暂无评论

TQP: An Efficient video Quality Assessment Framework for Adaptive Bitrate video Streaming

引用

IEEE ACCESS 2024年 12卷 88264-88278页

作者： Aslam, Muhammad Azeem Wei, Xu Ahmed, Nisar Saleem, Gulshan Zhu, Shuangtong Xu, Yimei Hu, Hongfei Xian Eurasia Univ Sch Informat Engn Xian 710065 Shaanxi Peoples R China Chinese Acad Sci Changchun Inst Opt Fine Mech & Phys Changchun 130033 Jilin Peoples R China Univ Engn & Technol Lahore Dept Comp Engn Lahore 54890 Punjab Pakistan Lahore Garrison Univ Dept Comp Sci Lahore Punjab Pakistan

The increasing popularity of video streaming services and the widespread accessibility of high-speed internet underscore the importance of delivering cost-effective and seamless streaming experiences. Shared internet connections may lead to varying speeds, impacting Quality of Experience (QoE). Rate adaptation techniques aim to ensure smooth video transmission, but overly optimistic adaptations can compromise user experience. Objective video quality assessment is crucial for efficient rate adaptation to ensure smooth QoE. This research proposes a novel method incorporating temporal channel shifting into Convolutional Neural Networks (CNN) for video quality assessment while maintaining the computational simplicity of a 2D CNN model. The proposed approach relies on the EfficientNet architecture, initially pre-trained on quality-aware images, and fine-tune it using datasets of rate-adaptive videos. The model is trained and evaluated on two benchmark datasets, namely "Waterloo sQoE III" and "LIVE Netflix II," which consist of rate-adaptive videos annotated with subjective quality scores. Experimental results encompass the evaluation of Pearson, Spearman, and Kendall correlation coefficients, along with the computation time ratio for the proposed approach. The outcomes reveal competitive scores of 0.795, 0.652, 0.772, and 0.216 for the "Live Netflix II dataset" and 0.782, 0.713, 0.721, and 0.230 for the "Waterloo sQoE III dataset." Our proposed method, compared to 24 approaches for "Waterloo sQoE III" and 25 for "LIVE Netflix II," attains the highest correlation scores while maintaining near-real-time processing efficiency. These results affirm the efficacy of our approach in accurately predicting human judgment (QoE) with computational efficiency.

关键词： Streaming media Quality assessment video recording Accuracy Quality of experience Computer architecture Training video quality image quality assessment rate adaption video streaming quality of experience QoE

来源：评论

学校读者我要写书评

暂无评论

Research on the video processing Module based on Cameralink Interface Technology

Research on the Video Processing Module based on Cameralink ...

引用

2023 International conference on Automation Control, Algorithm, and Intelligent Bionics, ACAIB 2023

作者： Ling, Yunzhi Zhu, Ziming Chen, Wei Jiangsu Institute of Automation Lian Yun Gang Jiangsu222000 China

ISBN: (纸本)9781510667662

In order to solve the problem of speed matching between image data output and acquisition, it is convenient to provide simple and flexible transmission for high-speed digital cameras and image acquisition cards. This paper proposes a video processing module based on CameraLink interface technology. By making full use of the internal resources of FPGA, the algorithm is executed efficiently. The system not only supports the clock 85MHz, but also the acquisition cache is not less than 64MB. real-time acquisition and processing of CameraLink with image characteristics of 200Hz/50Hz(16bit/8bit);Moreover, it supports two channels of 50Hz(8bit/8bit×3) CameraLink image output, and realizes the video conversion from CameraLink signal to SDI signal. The functional correctness and performance stability of the video processing module based on CameraLink proposed in this paper are verified through function and pressure tests. © 2023 SPIE.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

Design Optimization of Low Latency Transmission Based on Hi3559 video Platform 16

Design Optimization of Low Latency Transmission Based on Hi3...

引用

16th International conference on Digital image processing, ICDIP 2024

作者： Huang, Qiaojie Liu, Weijian Guangdong Agriculture Industry Business Polytechnic No.198 Yueken Road Tianhe District Guangzhou510507 China Guangzhou City University of Technology No. 1 Xuefu Road Huadu District Guangzhou510800 China

ISBN: (数字)9781510682917

ISBN: (纸本)9781510682900

In accordance to the problem of high latency in H.264/H.265 video transmission schemes, research was conducted on the design optimization scheme for H.264/H.265 low latency transmission based on the analysis of video data flow and encoding and decoding processes. image acquisition and encoding time was optimized by reducing the number of reading and writing operations in the video cache pool, increasing the input image frame rate, using multi-encoder encoding in parallel, etc. image encoding and display time was optimized by methods such as network data transmission delay optimization, multi-encoder parallel decoding, decoding and display delay optimization, decoding data path optimization, and display output optimization. The experimental results show that the optimal solution is to use VI (video Input), VPSS (video processing Sub-System), VENC (video Encoder) binding and online mode at the encoding end, and use VDEC (video Decoder), VPSS, VO (video Output) binding and direct mode, with an end-to-end display delay of 2-4 frames, which greatly reduces the delay of H.264/H.265 video transmission. © 2024 SPIE.

关键词： Encoding (symbols)

来源：评论

学校读者我要写书评

暂无评论

Fast Retrieval of Pharmaceutical Packaging images Using Keypoint Matching with Angle and Scale Voting for Outlier Rejection

Fast Retrieval of Pharmaceutical Packaging Images Using Keyp...

引用

2024 conference on Visual Communications and image processing

作者： Zakaria, Yona Ishiyama, Rui Ishidera, Eiki Matsui, Tomokazu Yasumoto, Keiichi Nara Inst Sci & Technol Nara Japan Univ Dodoma Dodoma Tanzania NEC Corp Ltd Tokyo Japan RIKEN Ctr Adv Intelligence Project Tokyo Japan

ISBN: (纸本)9798331529543;9798331529550

Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. Visual inspection of medicine packaging images through keypoint matching techniques offers a promising approach for detecting design inconsistencies that could indicate counterfeit products. However, conventional methods often struggle with high computational costs and reduced accuracy when processing images of varying quality and perspectives. To address these limitations, we propose the Angle and Scale Voting (ASVote) method, which enhances keypoint-based image matching by introducing a 2D voting mechanism that leverages relative angles and scales of the keypoints to eliminate false matches(outliers) while identifying consistent matches (inliers). This approach significantly improves both processing time and accuracy. Experiments on a real-world dataset of medicine packages show that ASVote improves processing time and accuracy, outperforming conventional methods.

关键词： keypoint feature matching RANSAC USAC image retrieval visual inspection counterfeit drugs

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：