检索结果-内蒙古大学图书馆

5th Asia-Pacific conference on image processing, Electronics and Computers, IPEC 2024

作者： Zhao, Lun Shandong Vocational College of Science and Technology Weifang China

ISBN: (纸本)9798350374407

This study addresses the deficiencies in the analysis of local parameters of target features in human motion video images during the rapid extraction of local features. This deficiency leads to inaccurate description of low-level feature maps when constructing hierarchical structures of video images, thereby increasing extraction time and resulting in lower recall and precision rates. To address this issue, a method for rapidly extracting local features of target objects in human motion video images is proposed. By utilizing the information entropy between adjacent frames of human motion video images and combining it with density functions to determine initial clustering centers, the number of clusters is calculated to facilitate the analysis of local parameters of target features in human motion video images. The study constructs a hierarchical structure of human motion video images by selecting feature maps from the convolutional output layer. The method introduces information entropy to describe low-level feature maps of human motion video images and combines it with regional averaging to describe high-level feature maps, ultimately completing the extraction of local features in human motion video images. Experimental results demonstrate that the proposed method achieves rapid extraction of local features in human motion video images, with shorter feature extraction times and higher recall and precision rates. © 2024 IEEE.

关键词： Motion capture

来源：评论

学校读者我要写书评

暂无评论

In-sensor neural network for real-time KWS by image processing

In-sensor neural network for real-time KWS by image processi...

引用

real-time processing of image, Depth and video Information 2023

作者： Vitolo, Paola Esposito, Pio Pau, Danilo Liguori, Rosalba Di Benedetto, Luigi Licciardo, Gian Domenico Fisciano84084 Italy Agrate Brianza20864 Italy

ISBN: (数字)9781510662636

ISBN: (纸本)9781510662629

KeyWord Spotting (KWS), i.e. the capability to identify vocal commands as they are pronounced, is becoming one of the most important features of Human-Machine Interface (HMI), also thanks to the pervasive diffusion of high-performance MEMS audio sensors with very reduced dimensions. In-Sensor Computing (ISC) appears the most viable solution to get the maximum advantage of KWS, since the dimensions of MEMS microphones remain reduced and minimally invasive. ISC, indeed, represents the extreme evolution of the edge computing paradigm, where the processing circuits are moved close to the audio sensor, integrated into its auxiliary circuitry or in the same package. However, ISC introduces severe area and power constraints and must trade off with processing speed to meet real-time operations naturally required by KWS. In this work, we want to show a neural networkbased KWS suitable for ISC contexts, when audio sensor data are converted into MEL spectrogram images and a Depthwise Separable Convolutional Neural Network (DSCNN) with feature extraction capabilities is designed. To show the advantages of the above approach, the DSCNN is compared with an alternative Fully Connected Neural Network (FCNN), operating on audio signals not converted into images. The considered models have been profiled on a microcontroller and implemented on an FPGA. Their performances are compared in terms of classification accuracy and HW resources. Comparisons show that the FCNN is very far from meeting the ISC real-time processing requirements, showing a number of parameters and a frame latency respectively of 3 and 1 orders of magnitude higher than required by the DSCNN alternative when mapped to a Xilinx Zynq Ultrascale+ MPSoC. © 2023 SPIE.

关键词： Low power electronics

来源：评论

学校读者我要写书评

暂无评论

video Smoke Detection Algorithm Based on a Spatial-Temporal Neural Network Model 9

Video Smoke Detection Algorithm Based on a Spatial-Temporal ...

引用

9th International conference on Intelligent Computing and Signal processing, ICSP 2024

作者： Cao, Zhen Zhang, Xi Shenyang Fire Science and Technology Research Institute Ministry of Emergency Management Shengyang China

ISBN: (纸本)9798350376548

a kind of spatial-temporal neural network video smoke detection algorithm is proposed in order to solve the problems associated with the incorrect classification of the static approximate smoke background in the face of the detection of smoke in video detection networks, and the problem of false alarms and of the original test model algorithms being different in different detection environments. Based on the original YOLO v4 neural network algorithm, this paper introduces a k-means + + algorithm and genetic algorithm, while using the algorithm's clustering function to classify the sample points of the real boxes of the image data set, which make it a more suitable anchor. At the same time, the genetic algorithm is used to adjust its anchor in order to allow the generated anchor to adapt to the needs related to smoke detection. In the original neural network model, the dual-stream network model algorithm is used to extract information from the first step of the YOLO algorithm in order to further filter the smoke's characteristics as well as filter out error information, all to improve the detection capabilities of the overall neural network for video smoke fog images. Compared with traditional YOLOv4 networks, the algorithm obtained by the model algorithm has been improved by 8.51°/0. In actual tests, the alarm time requirements of the smoke alarm test program for early fire monitoring and the alarm systems for visual images were improved, and the detection accuracy of the network was also improved based on the assurance of the detection speed, while the performance of the model algorithm was also improved for different scenes. © 2024 IEEE.

关键词： dual-stream Network K-means safety science smoke detection YOLOv4

来源：评论

学校读者我要写书评

暂无评论

Exploring Motion Cues for video Test-time Adaptation 23

Exploring Motion Cues for Video Test-Time Adaptation

引用

31st ACM International conference on Multimedia (MM)

作者： Zeng, Runhao Deng, Qi Xu, Huixuan Niu, Shuaicheng Chen, Jian Shenzhen Univ Guangdong Key Lab Elect Control & Intelligent Rob Coll Mech & Control Engn Shenzhen Peoples R China Shenzhen Univ Ind Res Ctr Intelligent Mfg Shenzhen Peoples R China South China Univ Technol Guangzhou Peoples R China Nanyang Technol Univ Singapore Singapore

ISBN: (纸本)9798400701085

Test-time adaptation (TTA) aims at boosting the generalization capability of a trained model by conducting self-/un-supervised learning during testing in real-world applications. Though TTA on image-based tasks has seen significant progress, TTA techniques for video remain scarce. Naively introducing image-based TTA methods into video tasks may achieve limited performance, since these methods do not consider the special nature of video tasks, e.g., the motion information. In this paper, we propose leveraging motion cues in videos to design a new test-time learning scheme for video classification. We extract spatial appearance and dynamic motion clip features using two sampling rates (i.e., slow and fast) and propose a fast-to-slow unidirectional alignment scheme to align fast motion and slow appearance features, thereby enhancing the motion encoding ability. Additionally, we propose a slow-fast dual contrastive learning strategy to learn a joint feature space for fastly and slowly sampled clips, guiding the model to extract discriminative video features. Lastly, we introduce a stochastic pseudo-negative sampling scheme to provide better adaptation supervision by selecting a more reliable pseudo-negative label compared to the pseudo-positive label used in prior TTA methods. This technique reduces the adaptation difficulty often caused by poor performance on out-of-distribution test data before adaptation. Our approach significantly improves performance on various video classification backbones, as demonstrated through extensive experiments on two benchmark datasets.

关键词： test-time adaptation video classification motion encoding

来源：评论

学校读者我要写书评

暂无评论

Neural Adaptive Contextual video Streaming

Neural Adaptive Contextual Video Streaming

引用

2025 IEEE International conference on Acoustics, Speech, and Signal processing, ICASSP 2025

作者： Yang, Jianchao Liu, Mufan Hou, Puyue Xu, Yiling Sun, Jun Department of Electronic Engineering Shanghai Jiao Tong University China

ISBN: (纸本)9798350368741

video streaming services typically employ traditional codecs, such as H.264, to encode videos into multiple bitrate representations. These codecs are tightly limited by discrete quantization parameters (QPs), resulting in encoded rates that do not align with the target bitrate. Additionally, the subpar video quality produced by conventional codecs does not meet the demands of high-resolution communication. Considering the limitations of traditional codecs, we take a fresh new approach to video streaming by leveraging advanced deep learning-based video codecs. Specifically, we develop a neural adaptive contextual video streaming framework that incorporates: 1) an ensemble deep reinforcement learning based adaptive bitrate algorithm named TSAC that enables continuous bitrate adjustment to varying network conditions 2) a two-stage proportional-integral-derivative-based rate control module that dynamically fine-tunes QPs to ensure the encoded bitrate aligning with the target bitrate. Furthermore, we implement intra-GoP and inter-GoP techniques to accelerate the inference process of the contextual video codec for real-time processing needs. Our experiments demonstrate that the average relative error in bitrate remains below 2%, the quality of experience provided by our TSAC agents surpasses that of existing discrete algorithms by 13%-20%. Our optimization techniques enable real-time decoding at approximately 24 frames per second for quad high definition videos. © 2025 IEEE.

关键词： adaptive bitrate streaming deep contextual video compression inference acceleration rate control

来源：评论

学校读者我要写书评

暂无评论

FAST video OBJECT SEGMENTATION VIA DYNAMIC YOLACT 47

FAST VIDEO OBJECT SEGMENTATION VIA DYNAMIC YOLACT

引用

47th IEEE International conference on Acoustics, Speech and Signal processing (ICASSP)

作者： Meng, Tianfang Zhang, Wenqiang Fudan Univ Sch Comp Sci Shanghai Key Lab Intelligent Informat Proc Shanghai Peoples R China

ISBN: (纸本)9781665405409

video Object Segmentation (VOS) is a fundamental task in video recognition with many practical applications. It aims at predicting segmentation masks of multiple objects in an entire video. Recent video object segmentation(VOS) researches have achieved remarkable performance. However, as a video processing task, the inference speed of the VOS method is also essential. VOS can be considered an extension of semantic segmentation from a static image to a dynamic image sequence. Following this idea, we propose a fast VOS framework based on YOLACT, a real-time static image segmentation framework. We employ a fast online training technique to make YOLACT grow wings to handle dynamic video sequences and achieve competitive performance(77.2 J&F and 30.9 FPS on DAVIS17) among fast VOS methods. Moreover, by linearly combining mask bases to generate masks for arbitrary objects, our method can process multi-object videos with minimal extra computations.

关键词： video Object Segmentation Conjugate Gradient Optimization Online Training

来源：评论

学校读者我要写书评

暂无评论

Lighting Search Algorithm With Convolutional Neural Network-Based image Captioning System for Natural Language processing

引用

IEEE ACCESS 2023年 11卷 142643-142651页

作者： Alnashwan, Rana Othman Chelloug, Samia Allaoua Almalki, Nabil Sharaf Issaoui, Imene Motwakel, Abdelwahed Sayed, Ahmed Princess Nourah bint Abdulrahman Univ Coll Comp & Informat Sci Dept Informat Syst POB 84428 Riyadh 11671 Saudi Arabia King Saud Univ Coll Educ Dept Special Educ Riyadh 12372 Saudi Arabia Qassim Univ Appl Coll Unit Sci Res Buraydah 52571 Saudi Arabia Prince Sattam Bin Abdulaziz Univ Coll Business Adm Hawtat Bani Tamim Dept Informat Syst Al Kharj 16278 Saudi Arabia Future Univ Egypt Res Ctr New Cairo 11835 Egypt

Recently, deep learning models have become more prominent due to their tremendous performance for real-time tasks like face recognition, object detection, natural language processing (NLP), instance segmentation, image classification, gesture recognition, and video classification. image captioning is one of the critical tasks in NLP and computer vision (CV). It completes conversion from image to text;specifically, the model produces description text automatically based on the input images. In this aspect, this article develops a Lighting Search Algorithm (LSA) with a Hybrid Convolutional Neural Network image Captioning System (LSAHCNN-ICS) for NLP. This introduced LSAHCNN-ICS system develops an end-to-end model which employs convolutional neural network (CNN) based ShuffleNet as an encoder and HCNN as a decoder. At the encoding part, the ShuffleNet model derives feature descriptors of the image. Besides, in the decoding part, the description of text can be generated using the proposed hybrid convolutional neural network (HCNN) model. To achieve improved captioning results, the LSA is applied as a hyperparameter tuning strategy, representing the innovation of the study. The simulation analysis of the presented LSAHCNN-ICS technique is performed on a benchmark database, and the obtained results demonstrated the enhanced outcomes of the LSAHCNN-ICS algorithm over other recent methods with maximum Consensus-based image Description Evaluation (CIDEr Code) of 43.60, 59.54, and 135.14 on Flickr8k, Flickr30k, and MSCOCO datasets correspondingly.

关键词： Feature extraction Convolutional neural networks Visualization Decoding Natural language processing Tuning Deep convolutional neural network natural language processing image captioning machine learning hyperparameter tuning

来源：评论

学校读者我要写书评

暂无评论

video Stabilization based on Sub-pixel Keypoints 4

Video Stabilization based on Sub-pixel Keypoints

引用

4th International conference on Defence Technology, ICDT 2024

作者： Zhong, Bihua Wang, Fangyi Liu, Tao Liu, Chengyi Bai, Hongyang Wan, Gang National Key Laboratory of Transient Physics Nanjing University of Science and Technology Nanjing210094 China School of Energy and Power Engineering Nanjing University of Science and Technology Nanjing210094 China Key Laboratory of Maritime Intelligent Cyberspace Technology Nanjing University of Science and Technology Ministry of Education Nanjing210094 China School of Electronic and Optical Engineering Nanjing University of Science and Technology Nanjing210094 China

With the continuous advancement of seeker technology and image processing techniques, the precision of guided weapons has increasingly improved. However, due to the rigidly fixed structure between the seeker and the guided weapon, the weapon is prone to experiencing disturbances and vibrations during flight, which can deteriorate the real-time video quality monitored by the seeker and, consequently reduce the precision of the guided weapon. To enhance the precision of guided weapons, it is necessary to mitigate the adverse effects of vibrations through video stabilization techniques. This paper proposes a video stabilization framework based on sub-pixel keypoints detection. It utilizes a lightweight network to detect keypoints, employs the Lucas-Kanade (LK) optical flow method for motion estimation, and smooths the camera's motion path with an adaptive filter. Experimental results show that the proposed algorithm has an average processing time of approximately 0.2s per frame, achieving a stability index of 0.9262, a PSNR index of 22.3074, and an SSIM index of 0.7188. This demonstrates a balanced performance in terms of computational efficiency and stabilization, exhibiting excellent comprehensive performance. © 2024 Institute of Physics Publishing. All rights reserved.

关键词： Optical flows

来源：评论

学校读者我要写书评

暂无评论

Automated Recognition of Optic Disc and Blood Vessels in Diabetic Fundoscopy images Using real-time image Analysis 7

Automated Recognition of Optic Disc and Blood Vessels in Dia...

引用

7th IEEE International conference on Multimedia Information processing and Retrieval (MIPR)

作者： Li, Kaixuan Chen, Wei-Bang Lu, Yongjin Wang, Xiaoliang Gao, He Virginia State Univ Dept Comp Sci Petersburg VA 23806 USA Oakland Univ Dept Math & Stat Rochester MI USA SUNY Coll Oswego Dept Comp Sci Oswego NY USA

ISBN: (纸本)9798350351439;9798350351422

Diabetic retinopathy (DR) is a sight-threatening condition associated with diabetes, characterized by damage to the retinal blood vessels. Key to the automation of DR staging is the identification of various symptoms directly or closely associated with retinal blood vessels, as well as the number of these symptoms in the four quadrants of the retina separated by the optic disc. Therefore, precise identification of the optic disc (OD) and blood vessels in fundus images is crucial for DR stage diagnosis but is often time-consuming and requires expert analysis. This study introduces a thresholding-based approach for the automated localization of the OD and the detection of blood vessels in fundus images of diabetic patients. Our algorithm is more robust than some deep learning-based algorithms, achieving more accurate results, particularly in advanced DR stages where the resemblance between various symptoms and blood vessels complicates the extraction of blood vessels. Additionally, our computer vision system can achieve OD localization and blood vessel segmentation in real time. The experimental results on a dataset selected by an ophthalmologist from a Kaggle dataset, ensuring data quality, show that the proposed algorithm can achieve an accuracy higher than 94% for both OD localization and blood vessel detection, outperforming some state-of-the-art algorithms.

关键词： diabetic retinopathy optical disc localization blood vessel detection fundus image analysis automated ophthalmic diagnostics

来源：评论

学校读者我要写书评

暂无评论

Lightweight Portrait Matting via Regional Attention and Refinement

Lightweight Portrait Matting via Regional Attention and Refi...

引用

IEEE/CVF Winter conference on Applications of Computer Vision (WACV)

作者： Zhong, Yatao Zharkov, Ilya Microsoft Redmond WA 98052 USA

ISBN: (纸本)9798350318920;9798350318937

We present a lightweight model for high resolution portrait matting. The model does not use any auxiliary inputs such as trimaps or background captures and achieves real time performance for HD videos and near real time for 4K. Our model is built upon a two-stage framework with a low resolution network for coarse alpha estimation followed by a refinement network for local region improvement. However, a naive implementation of the two-stage model suffers from poor matting quality if not utilizing any auxiliary inputs. We address the performance gap by leveraging the vision transformer (ViT) as the backbone of the low resolution network, motivated by the observation that the tokenization step of ViT can reduce spatial resolution while retain as much pixel information as possible. To inform local regions of the context, we propose a novel cross region attention (CRA) module in the refinement network to propagate the contextual information across the neighboring regions. We demonstrate that our method achieves superior results and outperforms other baselines on three benchmark datasets while only uses 1/20 of the FLOPS compared to the existing state-of-the-art model.

关键词： Algorithms Algorithms Computational photography image and video synthesis image recognition and understanding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：