检索结果-内蒙古大学图书馆

2024 International conference on image processing

作者： Tarchouli, Marwa Guionnet, Thomas Riviere, Marc Hamidouche, Wassim Outtas, Meriem Deforges, Olivier Ateme Rennes France Univ Rennes INSA Rennes CNRS IETRUMR 6164 Rennes France

ISBN: (纸本)9798350349405;9798350349399

This paper proposes the integration of residual blocks into neural representation for videos (NeRV)-based architectures with the aim of enhancing the reconstruction of detailed patterns and high-level features. Additionally, a coding pipeline is introduced, placing the implicit neural decoder in a real-life video streaming framework. Indeed, DeepCABAC is employed for model compression, applying a quantization scheme followed by the context-adaptive binary arithmetic coding (CABAC) entropy coding algorithm, ultimately leading to bitstream generation. Our method outperforms NeRV, as well as x264 and x265, achieving BD-rate gains against NeRV : -12.06% using PSNR and -14.25% using MS-SSIM. Furthermore, it exhibits superior subjective quality compared to NeRV, attributed to enhanced high-level feature reconstruction. This observed behavior encourages the application of our method to other NeRV-based models, such as E-NeRV.

关键词： Implicit Neural Representation Learned video Compression video Compression

来源：评论

学校读者我要写书评

暂无评论

SA-BiSeNet: Swap attention bilateral segmentation network for real-time inland waterways segmentation

引用

IET image processing 2023年第1期17卷 166-177页

作者： Zhang, W. B. Wu, C. Y. Bao, Z. S. Beijing Univ Technol Fac Informat Technol Beijing 100124 Peoples R China

The technology for autonomous navigation on inland waterways is worth investigating, and navigable water surface segmentation is a key part of this technology. Semantic segmentation methods based on deep learning are able to distinguish between water surface areas and non-water surface areas. However, existing semantic segmentation methods cannot meet the requirements of the water surface segmentation task in terms of both segmentation precision and real-time performance. In this study, a Swap Attention Bilateral Segmentation Network (SA-BiSeNet) is proposed to improve segmentation performance while ensuring model inference speed by better fusing the two features of the dual-branch down-sampling network using the attention mechanism. Specifically, an innovative Swap Attention Module is designed to model the dependency between the features of the spatial detail branch and the features of the semantic branches, thus expanding the receptive fields of the spatial detail and semantic branches to each other's global contexts. This design can effectively fuse features and thus enhance feature representation. Experiments were conducted on the inland waterway dataset USVInland to verify the performance of SA-BiSeNet in terms of segmentation precision and inference speed, and SA-BiSeNet achieved 93.65% Mean IoU and maintained the same level of fps as the baseline.

关键词： image segmentation feature extraction semantic branches water surface segmentation task inland waterway dataset USVInland real-time inland waterways segmentation Neural nets inference speed water surface areas nonwater surface areas segmentation performance SA-BiSeNet Swap attention bilateral segmentation network Optical, image and video signal processing Computer vision and image processing techniques innovative Swap Attention Module real-time performance object detection deep learning (artificial intelligence) navigable water surface segmentation semantic segmentation methods Swap Attention Bilateral Segmentation Network segmentation precision geophysical image processing

来源：评论

学校读者我要写书评

暂无评论

Exploring Action Recognition in Endoscopy video Datasets

Exploring Action Recognition in Endoscopy Video Datasets

引用

conference on real-time image processing and Deep Learning

作者： Tian, Yuchen Paheding, Sidike Azimi, Ehsan Lee, Eung-Joo Tufts Univ 419 Boston Ave Medford MA 02155 USA Fairfield Univ 1073 N Benson Rd Fairfield CT USA Univ Arizona 1200 E Univ Blvd Tucson AZ 85721 USA

ISBN: (纸本)9781510673878;9781510673861

Surgical image and video applications using endoscopic datasets have been actively investigated to develop advanced surgical assistant systems. These applications are particularly crucial for understanding surgical scenes during procedures. Specifically, segmentation techniques allow for identifying anatomical structures and surgical instruments, while quality control methods refine surgical techniques, and action recognition aids in discerning surgical steps. A significant improvement in performance across different downstream tasks has been achieved due to the advancements in deep neural networks and the expansive training dataset available. However, the exploration of surgical action recognition remains limited. Existing methods face challenges in real-world settings, mainly due to the lack of adaptability in a dynamic imaging environment. In this study, we present a framework for surgical action recognition in endoscopic datasets by leveraging video-masked autoencoders (videoMAE), which has shown promise in video dataset analysis with minimal datasets. Additionally, we incorporate a temporal data augmentation technique to represent diverse imaging conditions and resolve the issue of using single-source data with low quality. For our experiments, we utilize videoMAE v2 pre-trained on Unlabeled Hybrid datasets and fine-tune the model on the CholecT45 dataset for validation. Our proposed method shows the effectiveness of using the videoMAE structure with focal loss, particularly for action recognition tasks in surgical scenarios.

关键词： Surgical action recognition video MAE endoscopic datasets CholecT45

来源：评论

学校读者我要写书评

暂无评论

VCD: A video CONFERENCING DATASET FOR video COMPRESSION 49

VCD: A VIDEO CONFERENCING DATASET FOR VIDEO COMPRESSION

引用

49th IEEE International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Naderi, Babak Cutler, Ross Khongbantabam, Nabakumar Singh Hosseinkashi, Yasaman Turbell, Henrik Sadovnikov, Albert Zou, Quan Microsoft Corp Redmond WA 98052 USA

ISBN: (纸本)9798350344868;9798350344851

Commonly used datasets for evaluating video codecs are all very high quality and not representative of video typically used in video conferencing scenarios. We present the video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing. VCD includes a wide variety of camera qualities and spatial and temporal information. It includes both desktop and mobile scenarios and two types of video background processing. We report the compression efficiency of H.264, H.265, H.266, and AV1 in low-delay settings on VCD and compare it with the non-video conferencing datasets UVC, MLC-JVC, and HEVC. The results show the source quality and the scenarios have a significant effect on the compression efficiency of all the codecs. VCD enables the evaluation and tuning of codecs for this important scenario. The VCD is publicly available as an open-source dataset at https://***/microsoft/VCD.

关键词： video Dataset video Quality video Compression Low-delay real-time Communication

来源：评论

学校读者我要写书评

暂无评论

Special Vehicle Classification Algorithm-Based System for Dedicated Parking Zone Violation Detection in South Korea

引用

IEEE ACCESS 2025年 13卷 7883-7901页

作者： Park, Hyunseong Kim, Kapyeol Jeong, Incheol Jung, Jungil Cho, Jinsoo Gachon Univ Coll IT Dept IT Convergence Engn Seongnam Si 13120 South Korea PCT Co Ltd Seongnam Si 13449 South Korea Gachon Univ Dept Comp Engn Seongnam Si 13120 South Korea

To address the problem of managing dedicated parking zones arising from the increasing number of electric vehicles and vehicles for the physically challenged, this paper proposes a license plate recognition (LPR)-based parking control system that combines the YOLO and MobileNet algorithms. These two algorithms are designed for real-time object detection and efficient preprocessing, respectively, and can operate in real time in resource-constrained edge-device environments. In tests using data from more than 51,000 vehicles, the system achieved an accuracy rate of 95.76% in classifying electric vehicles and 97.18% in classifying vehicles for the physically challenged. The average CPU and RAM utilizations of the system were 34.54% and 45.04%, respectively. In addition, the processing time per image was recorded as approximately 1.04 s, demonstrating its potential to run reliably on edge devices. These results are expected to facilitate the efficient resolution of parking management problems in smart cities and effective operation of parking zones reserved for electric vehicles and vehicles for the physically challenged.

关键词： real-time systems YOLO Accuracy License plate recognition Classification algorithms Artificial intelligence Feature extraction Computational modeling Computational efficiency Smart cities bounding box license plate recognition (LPR) MobileNet optical character recognition (OCR) video processing

来源：评论

学校读者我要写书评

暂无评论

OSBF: One-Sided Box Filter for Edge-Preserving image processing

引用

IEEE ACCESS 2025年 13卷 61149-61160页

作者： Gong, Yuanhao Shenzhen Univ Coll Elect & Informat Engn Shenzhen 518060 Peoples R China

Box filter is well-known for the image smoothing task, thanks to its effectiveness and computation efficiency. However, it can NOT preserve edges. In contrast, edge-preserving methods can NOT achieve the high computation performance as the box filter. To tackle this issue, in this paper, we present a one-sided box filter that can preserve edges much better than the box filter. Meanwhile, it has a similar high computation performance as the box filter. More specifically, we perform the box filter on nine one-sided local windows, and then select the most possible candidate as the result. Such selection imposes the non-linearity, which preserves the edges and corners. Several numerical experiments are conducted to confirm this edge-preserving property. At the same time, it has a similar computation performance as the box filter. It inherits the constant computation complexity $O(1)$ and the linear complexity $O(N)$ from the box filter with respect to the window size and the total number of pixels, respectively. We numerically confirm that this filter is the fastest method among the edge-preserving methods, including the classical and the state of the art approaches. It is at least $10 \times $ faster than other edge-preserving methods. Thanks to the edge-preserving property and the high computation performance, the proposed one-sided box filter can be deployed in a large range of applications where the edge-preserving and high performance is required, such as real-time video processing, augmented reality and view synthesis.

关键词： image edge detection Information filters Shape Windows Smoothing methods Kernel Computational efficiency Computational complexity Transfer functions real-time systems Box filter half window OSBF one-sided

来源：评论

学校读者我要写书评

暂无评论

Strong real-time transmission technology of lossless video based on Fibre Channel

Strong real-time transmission technology of lossless video b...

引用

2023 International conference on image, Signal processing, and Pattern Recognition, ISPP 2023

作者： Fu, Nian Tong, Wentao Zhao, Yupu Wuhan Digital Engineering Research Institute Hubei Wuhan430074 China

ISBN: (纸本)9781510666351

In order to meet the demand of strong real-time display in large integrated video network, this paper proposes a lossless video strong real-time transmission display model based on fibre channel protocol. By accessing the original video data of multi-standard sensors, the normalized video format is packaged into fibre channel protocol data for long-distance lossless transmission and real-time display. Finally, the multi-format original video transmission display based on fibre channel FC-AE-ASM protocol is realized by FPGA accelerating platform, this technology can support the real-time display of multi-format and multi-resolution raw data, the maximum bandwidth is no less than 130MB/S, the maximum resolution is no less than the 1920 times 1080@60HZ, and the total system delay is no more than 0.79ms. © 2023 SPIE.

关键词： image communication systems

来源：评论

学校读者我要写书评

暂无评论

real-time face perception based encoding strategy optimization method for UHD videos

引用

IET image processing 2023年第9期17卷 2764-2779页

作者： Bi, Jiang Wang, Lidong Han, Yu Zhou, Cheng Beijing Radio & Televis Stn Beijing Peoples R China Sumavis Software Technol Co Ltd Beijing Peoples R China Beijing Radio & Televis Stn 98 Jianguo Rd Beijing 100022 Peoples R China

Face regions containing rich semantic information appear frequently in the videos. As the video resolution increase dramatically, the face regions will inevitably attract more attentions. This paper proposes a face perception based coding scheme to improve the visual quality of the face regions in UHD videos. A specially tailored face perception model is first utilized to precisely and quickly locate the face regions. Then, a face perception map is generated based on a hierarchical mapping algorithm. Finally, the face perception map is employed as a guidance to optimize the encoding process, including mode decision, block partition and bit allocation. The proposed method is implemented on HEVC to demonstrate the effectiveness. Experimental results on a set of 4K test sequences show that the proposed method can obviously improve the objective and subjective quality of the face regions, while causing only slight quality decline over the rest of the frame. Additionally, the computation required for mode decision and block partition is reduced, thereby saving encoding time cost.

关键词： data compression encoding face recognition image processing signal processing video coding visual perception

来源：评论

学校读者我要写书评

暂无评论

TRICKVOS: A BAG OF TRICKS FOR video OBJECT SEGMENTATION 30

TRICKVOS: A BAG OF TRICKS FOR VIDEO OBJECT SEGMENTATION

引用

30th IEEE International conference on image processing (ICIP)

作者： Skartados, Evangelos Georgiadis, Konstantinos Yucel, M. Kerim Ioannis, Koskinas Domi, Armando Drosou, Anastasios Manganelli, Bruno Saa-Garriga, Albert CERTH Inst Informat Technol Thessaloniki Greece Samsung Res UK Staines England

ISBN: (纸本)9781728198354

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods;i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS;a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

关键词： video Object Segmentation Pretraining Space-time Memory Networks

来源：评论

学校读者我要写书评

暂无评论

Low-complexity CNN-based CU partitioning for intra frames

引用

JOURNAL OF real-time image processing 2023年第4期20卷 73页

作者： Rahimi, Yaser Rezaei, Mehdi Jafari, Pouria Univ Sistan & Baluchestan Dept Elect & Comp Engn Zahedan Iran

The High-Efficiency video Coding (HEVC) standard has high compression efficiency. This efficiency is achieved at the expense of increasing the computational complexity. The HEVC encoder has the hierarchical search for optimal Coding Unit (CU) partitioning. It is based on rate-distortion optimization. Various solutions are proposed to reduce the encoding time. But, the machine learning-based methods have more effective in reducing the encoding time. Yet, deep learning tools have a relatively high computational load. So, in this paper a new low complexity convolutional neural network has been designed. It is called Convolutional Neural Network-based CTU Partitioner (CNNCP). It reduces the computational complexity of the HEVC encoding. The CNNCP takes the CTU luminance component and the quantization parameter (QP) as inputs, and provides the CU depth matrix in output at once. The CNNCP does not follow the hierarchical approach. Thus, it has a fixed computation structure that facilitates the use of parallel processing tools. The CNNCP has a simple structure with a least number of parameters, and thus, it has the least computational complexity. It has been trained and tested with a large database for all QP values. The results show that it reduced the encoding time by more than 90%, and makes it suitable for real-time applications.

关键词： Low complexity Coding tree unit (CTU) Convolutional neural network (CNN) CU partitioning High efficiency video coding (HEVC) Intra-picture prediction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：