检索结果-内蒙古大学图书馆

IEEE International conference on Multimedia and Expo (ICME)

作者： Yu, Chao-Liang Lin, I-Chen Natl Yang Ming Chiao Tung Univ Coll Comp Sci Hsinchu Taiwan

ISBN: (纸本)9781665468916

This paper presents an efficient and effective matting framework for human video clips. To alleviate the inefficiency problem in existing models, we propose using a refiner dedicated to error-prone regions, and reduce the computation at higher resolutions, so the proposed framework can achieve real-time performance for 1080p 60fps videos. Also, with the recurrent architecture, our model is aware of temporal information and produces temporally more consistent matting results compared to models processing each frame individually. Moreover, it contains a module for capturing semantic information. That makes our model easy to use without troublesome setup, such as annotating trimaps or other additional inputs. Experiments show that our proposed method outperforms previous matting methods, and reaches the state of the art on the videoMatte240K dataset.

关键词： video matting refinement network recurrent network real-time processing

来源：评论

学校读者我要写书评

暂无评论

Pango FPGA Implementation of a real-time Display System for Multi-Channel video processing

Pango FPGA Implementation of a Real-Time Display System for ...

引用

2024 IEEE International conference on Sensing, Diagnostics, Prognostics, and Control, SDPC 2024

作者： Huang, Cao Du, Lifu Peng, Yu Xiang, Gang Yu, Yang Chen, Jingyi School of Electronics Information Engineering Habin Institute of Technology Harbin China Beijing Aerospace Automatic Control Institute Beijing China Beijing Aerospace Automatic Control Institute Department of System Engineering Beijing China

ISBN: (纸本)9798350388855

A Pango FPGA-based solution for merging and processing multiple independent video streams and reconstructing a real-time display system is presented. The system handles each video stream individually, then produces a combined video output in HDMI format. To ensure high efficiency and low latency, a parallel processing scheme and a hardware-oriented architecture are employed. The Pango PG50K FPGA board acts as the central hardware for managing inputs, reconstructing video, and producing outputs. Rapid-access DDR3-SDRAM ensures efficient data caching and signal integrity. This system is ideal for applications requiring rapid response, such as traffic monitoring and industrial automation, and supports quick algorithm iteration and optimization, enhancing performance and adaptability. © 2024 IEEE.

关键词： video streaming

来源：评论

学校读者我要写书评

暂无评论

A Novel Platform with Motion video Recognition for Intelligent Sport Monitoring Application 3

A Novel Platform with Motion Video Recognition for Intellige...

引用

3rd International conference on Smart Data Intelligence, ICSMDI 2023

作者： Niu, Zili Sports Institute of Wuhan Business University HuBei WuHan430056 China

ISBN: (纸本)9781665464871

A novel platform with motion video recognition for the intelligent sport monitoring application is studied in this manuscript. The action markers of human target action images in sports videos are random. Combining image preprocessing and specific part recognition technology, dynamic tracking and recognition of human target action markers in sports videos is an important application scenario. This paper proposes the novel system of processing, which contains the image preprocessign, arm recognition, whole body recognition and the data storage. The proposed is efficiency in processing the real-time video images and the tests show the performance. © 2023 IEEE.

关键词： Intelligent systems

来源：评论

学校读者我要写书评

暂无评论

MonkeyGPT: Generative AI in Network Anomaly Detection of video conference Applications 22

MonkeyGPT: Generative AI in Network Anomaly Detection of Vid...

引用

22nd IEEE International Symposium on Parallel and Distributed processing with Applications, ISPA 2024

作者： Xu, Junhao He, Dongbiao Zhang, Chen Zhou, Xu Ming, Zhongxing Cui, Laizhong Sangfor Technologies Inc China Shenzhen University College of Computer Science and Software Engineering China China

ISBN: (纸本)9798331509712

The rapid advancement of generative artificial intelligence (GAI) has led to the creation of transformative applications such as ChatGPT, which significantly boosts text processing efficiency and diversifies audio, image, and video content. Beyond digital content creation, GAI's capability to analyze complex data distributions holds immense potential for next-generation networks and communications, especially given the swift rise of video conferencing applications (VCAs). This paper presents a dynamic, real-time method for detecting anomalous network links in video conferencing applications. The proposed tool, MonkeyGPT, generates tracing representations of network activity and trains a large language model from scratch to serve as a detection system based on network traffic data. Unlike traditional methods, MonkeyGPT provides an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a wider range of anomalies. We demonstrate the effectiveness of MonkeyGPT as an anomaly detection tool in real-world VCAs. The results indicate that the model possesses strong detection capabilities, achieving an accuracy rate of over 97%. It is applicable to various platforms, including Zoom, Microsoft Teams, Tencent Meeting, and Feishu, showcasing its robust adaptability. © 2024 IEEE.

关键词： video conferencing

来源：评论

学校读者我要写书评

暂无评论

real-time Vibration Visualization Using GPU-Based High-Speed Vision

引用

JOURNAL OF ROBOTICS AND MECHATRONICS 2022年第5期34卷 1011-1023页

作者： Wang, Feiyue Hu, Shaopeng Shimasaki, Kohei Ishii, Idaku Hiroshima Univ Grad Sch Adv Sci & Engn 1-4-1 Kagamiyama Higashihiroshima Hiroshima 7398527 Japan Hiroshima Univ Digital Monozukuri Mfg Educ & Res Ctr 3-10-32 Kagamiyama Higashihiroshima Hiroshima 7390046 Japan

In this study, we developed a real-time vibration visualization system that can estimate and display vibration distributions at all frequencies in real time through parallel implementation of subpixel digital image correlation (DIC) computations with short-time Fourier transforms on a GPU-based high-speed vision platform. To help operators intuitively monitor high-speed motion, we introduced a two-step framework of high-speed video processing to obtain vibration distributions at hundreds of hertz and video conversion processing for the visualization of vibration distribution at dozens of hertz. The proposed system can estimate the full-field vibration displacements of 1920x1080 images in real time at 1000 fps and display their frequency responses in the range of 0500 Hz on a computer at dozens of frames per second by accelerating phase-only DICs for full-field displacement measurement and video conversion. The effectiveness of this system for real-time vibration monitoring and visualization was demonstrated by conducting experiments on objects vibrating at dozens or hundreds of hertz.

关键词： vibration monitoring high-speed vision digital image correlation short-time Fourier transform

来源：评论

学校读者我要写书评

暂无评论

Detection of young fruit for "Yuluxiang" pears in natural environments using YOLO-CiHFC

引用

SIGNAL image AND video processing 2025年第5期19卷 1-12页

作者： Sun, Haixia Ren, Rui Zhang, Shujuan Yang, Sheng Cui, Tianyu Su, Meng Shanxi Agr Univ Coll Agr Engn Jinzhong 030801 Peoples R China Dryland Farm Machinery Key Technol & Equipment Key Jinzhong 030801 Peoples R China Shanxi Agr Univ Pomol Inst Jinzhong 030815 Peoples R China

Robust and efficient detection of young "Yuluxiang" pears fruits has a significant challenge in natural environments. This difficulty arises from factors such as the similar color between young fruits and the background, occlusion from branches and leaves, fruit denseness, and small fruit size. To achieve the precise detection, a lightweight detection method named YOLO-CiHFC was proposed in this study. The CiR module was constructed using the Inverted Residual Mobile Block (iRMB) and C2f module. The C2f modules of the YOLOv8n backbone and neck networks were all replaced with CiR modules to maintain low computational complexity and the number of parameters, and to enhance the feature extraction and fusion capabilities of the model. Then, the HS-FPN structure was introduced to reconstruct the neck network, and the Focaler-CIoU was introduced as the loss function of models. In comparison with the YOLOv8n, the F1 score and average precision (AP) of the YOLO-CiHFC improved by 0.25% and 1.77%, respectively. The inference time (achieving 1.5 ms) of the YOLO-CiHFC was 0.2 ms faster than the YOLOv8n. The model size of YOLO-CiHFC was 52.94% of the original model. Furthermore, a comparison was performed between the YOLO-CiHFC model and common lightweight models, such as YOLOv3-Tiny, YOLOv4-Tiny, YOLOv5n, and YOLOv7-Tiny. The results showed that the YOLO-CiHFC model achieved the optimal F1 score of 85.95% and AP of 88.00%, had the smallest model size of 3.15 MB, and obtained the best detection results under different scenarios. The YOLO-CiHFC model was deployed in Jetson nano at a real-time detection speed of 25.5 f/s. In this study, the proposed YOLO-CiHFC method not only achieved lightweight, but also improved the accuracy and speed of detection. This study can provide the methodological support for intelligent detection of young "Yuluxiang" pears fruits.

关键词： Detection image processing Young "Yuluxiang" pears YOLO

来源：评论

学校读者我要写书评

暂无评论

Lossy Coding for Spatially Adaptive Conditioning in Semantic image Communication

Lossy Coding for Spatially Adaptive Conditioning in Semantic...

引用

2024 conference on Visual Communications and image processing

作者： Eteke, Cem Griessel, Alexander Kellerer, Wolfgang Steinbach, Eckehard Tech Univ Munich Sch Computat Informat & Technol Chair Media Technol D-80333 Munich Germany Tech Univ Munich Sch Computat Informat & Technol Chair Commun Networks D-80333 Munich Germany Tech Univ Munich Sch Computat Informat & Technol Munich Inst Robot & Machine Intelligence D-80333 Munich Germany

ISBN: (纸本)9798331529543;9798331529550

The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This paper presents a method to enhance semantic image communication that combines a novel lossy semantic encoding approach with spatially adaptive semantic image synthesis models. By developing a model-agnostic training augmentation strategy, our approach substantially reduces susceptibility to distortion introduced during encoding, effectively eliminating the need for lossless semantic encoding. Comprehensive evaluation across two spatially adaptive conditioning methods and three popular datasets indicates that this approach enhances semantic image communication at very low bit rate regimes.

关键词： semantic communication lossy semantic coding semantic augmentation image synthesis

来源：评论

学校读者我要写书评

暂无评论

Deception Detection Algorithm Based on Global and Local Feature Fusion with Multi-head Attention 3

Deception Detection Algorithm Based on Global and Local Feat...

引用

3rd International conference on image processing and Media Computing (ICIPMC)

作者： Kang, Jian Qu, Wen Cui, Shaoxing Feng, Xiaoyi Northwestern Polytech Univ Sch Elect & Informat Xian Peoples R China Xian Univ Technol Fac Automat & Informat Engn Xian Peoples R China

ISBN: (纸本)9798350386660;9798350386677

Deception is a prevalent human behavior that significantly impacts our perception of essential facts. Therefore, developing accurate deception detection technology holds great significance. However, current research on pure visual deception detection algorithms does not leverage deep learning methods to extract detailed features such as facial Action Units (AUs) and Gaze Angles. Additionally, the global information within facial video sequences is often overlooked. To address these limitations, this paper introduces a novel deception detection model that combines global and local facial features through attention mechanisms. Firstly, the model focuses on the local features of the face, computing AU Strength and Gaze Angle for each frame to create a multivariate time series for every video. Subsequently, the Siamese Transformer model, employing Patching, extracts deep temporal and channel features from the multivariate time series. Additionally, the occurrence frequency of five specific AUs is selected as a manual feature. Secondly, the model conducts video understanding based on the global features of the face. Local features are extracted from each frame using Shallow CNNs with multiple sensitivity fields. Then, a video Transformer model with spatiotemporal separation attention is applied to globally model the sequence of face frames. Finally, the extracted local and global facial features are concatenated and fed into a classifier to determine deception. Extensive experiments on existing datasets validate the outstanding performance of the proposed method.

关键词： Deception detection Facial AU Multivariate time series Transformer video Understanding

来源：评论

学校读者我要写书评

暂无评论

Development of Wheelchair Basketball Player Tracker Using LED and Omni-camera

Development of Wheelchair Basketball Player Tracker Using LE...

引用

IEEE International conference on Consumer Electronics - Taiwan (ICCE-TW)

作者： Kojima, Kazuyuki Shonan Inst Technol Fac Engn Fujisawa Kanagawa Japan

ISBN: (数字)9781665470506

ISBN: (纸本)9781665470506

This paper describes the development of a tracker for basketball wheelchair players using special flashing LEDs and an omnidirectional camera. Our previous trackers required some image processing to find multiple LEDs in the video captured by the omnidirectional camera. The problem was that these image processes were time consuming. This study uses convolutional neural networks to reduce the time it takes to find multiple LEDs in that video.

关键词： Training Wheelchairs image processing Machine learning Light emitting diodes Cameras real-time systems

来源：评论

学校读者我要写书评

暂无评论

Enhancing English Learning Environments Through real- time Emotion Detection and Sentiment Analysis

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2024年第7期15卷 875-889页

作者： Orosoo, Myagmarsuren Rajkumari, Yaisna Ramesh, Komminni Fatma, Gulnaz Nagabhaskar, M. Gopi, Adapa Rengarajan, Manikandan Mongolian Natl Univ Educ Ulaanbaatar Mongolia NIT Delhi Dept Appl Sci & Humanities & Management Delhi 110036 India BoS Anurag Engn Coll Kodad Telangana India Jazan Univ Dept English Jazan Saudi Arabia Mallareddy Engn Coll Autonomous Dept MBA Main Campus Maisammaguda Hyderabad India Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Green Fields Guntur Andhra Pradesh India Vel Tech Rangarajan Dr Sagunthala R&D Inst Sci & T Chennai Tamil Nadu India

Educational technology is increasingly focusing on real-time- time language learning. Prior studies have utilized Natural Language processing (NLP) to assess students' classroom behavior by analyzing their reported feelings and thoughts. However, these studies have not fully enhanced the feedback provided to instructors and peers. This research addresses this issue by combining two innovative technologies: Federated 3D- Convolutional Neural Networks (Fed 3D- CNN) and Long Short- Term Memory (LSTM) networks and also aims to investigate classroom attitudes to enhance students' language competence. These technologies enable the modification of teaching strategies through text analysis and image recognition, providing comprehensive feedback on student interactions. For this study, the Multimodal Emotion Lines Dataset (MELD) and eNTERFACE'05 datasets were selected. eNTERFACE contains 3D images of individuals, while MELD analyzes spoken patterns. To address over fitting issues, the SMOTE technique is used to balance the dataset through oversampling and under sampling. The study accurately predicts human emotions using Federated 3D- CNN technology, which excels in image processing by predicting personal information from various angles. Federated Learning with 3D-CNNs- CNNs allows simultaneous implementation for multiple clients by leveraging both local and global weight changes. The NLP system identifies emotional language patterns in students, laying the foundation for this analysis. Although not all student feedback has been extensively studied in the literature, the Fed 3D- CNN and LSTM algorithm recommendations are valuable for extracting feedback- related information from audio and video. The proposed framework achieves a prediction accuracy of 97.72%, outperforming existing methods. This study aims to investigate classroom attitudes to enhance students' language competence.

关键词： Convolutional neural network federated learning LSTM Natural Language processing SMOTE

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：