检索结果-内蒙古大学图书馆

IEEE/CVF Winter conference on Applications of Computer Vision (WACV)

作者： Wang, Hai Xiang, Xiaoyu Fan, Yuchen Xue, Jing-Hao UCL London England Meta Real Labs Menlo Pk CA USA

ISBN: (纸本)9798350318920;9798350318937

Personalized text-to-image (T2I) synthesis based on diffusion models has attracted significant attention in recent research. However, existing methods primarily concentrate on customizing subjects or styles, neglecting the exploration of global geometry. In this study, we propose an approach that focuses on the customization of 360-degree panoramas, which inherently possess global geometric properties, using a T2I diffusion model. To achieve this, we curate a paired image-text dataset specifically designed for the task and subsequently employ it to fine-tune a pre-trained T2I diffusion model with LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity between the leftmost and rightmost sides of the synthesized images, a crucial characteristic of 360-degree panoramas. To address this issue, we propose a method called StitchDiffusion. Specifically, we perform pre-denoising operations twice at each time step of the denoising process on the stitch block consisting of the leftmost and rightmost image regions. Furthermore, a global cropping is adopted to synthesize seamless 360-degree panoramas. Experimental results demonstrate the effectiveness of our customized model combined with the proposed StitchDiffusion in generating high-quality 360-degree panoramic images. Moreover, our customized model exhibits exceptional generalization ability in producing scenes unseen in the fine-tuning dataset. Code is available at https://***/littlewhitesea/StitchDiffusion.

关键词： 3D Algorithms Algorithms Computational photography etc. Generative models for image image and video synthesis video

来源：评论

学校读者我要写书评

暂无评论

Research on hardware implementation of video detail enhancement algorithm based on adaptive fractional differentiation 22

Research on hardware implementation of video detail enhancem...

引用

2022 Asia conference on Electrical, Power and Computer Engineering, EPCE 2022

作者： Ma, Sasa Wang, Weichen Liu, Haitao Wang, Chuanyue Wang, Weiming 32181 Unit of the Chinese People's Liberation Army Hebei University of Technology China Bonanza Sensor Technology Development Co..Ltd China 32181 Unit of the Chinese People's Liberation Army China Hebei University of Technology China Shijiazhuang Tiedao University China

ISBN: (纸本)9781450396127

The image enhancement processing algorithm is complex, the amount of data calculation is large, and the real-time requirements are high. The traditional CPU serial data processing ability is difficult to meet the real-time requirements. Firstly, an adaptive fractional differential image enhancement algorithm based on improved Tiansi operator is proposed, and then an image detail enhancement acceleration processing system is constructed based on the software and hardware collaborative design idea of ZYNQ heterogeneous platform. The hardware acceleration of the algorithm is realized by increasing the data throughput of convolution operation. Experimental results show that this method can significantly improve the image detail processing ability in SDI digital video. © 2022 ACM.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

Production-Ready Face Re-Aging for Visual Effects

引用

ACM TRANSACTIONS ON GRAPHICS 2022年第6期41卷 p1-12页

作者： Zoss, Gaspard Chandran, Prashanth Sifakis, Eftychios Gross, Markus Gotardo, Paulo Bradley, Derek DisneyRes Studios Zurich Switzerland Swiss Fed Inst Technol Zurich Switzerland Univ Wisconsin Madison Madison WI USA

Photorealistic digital re-aging of faces in video is becoming increasingly common in entertainment and advertising. But the predominant 2D painting workflow often requires frame-by-frame manual work that can take days to accomplish, even by skilled artists. Although research on facial image re-aging has attempted to automate and solve this problem, current techniques are of little practical use as they typically suffer from facial identity loss, poor resolution, and unstable results across subsequent video frames. In this paper, we present the first practical, fully-automatic and production-ready method for re-aging faces in video images. Our first key insight is in addressing the problem of collecting longitudinal training data for learning to re-age faces over extended periods of time, a task that is nearly impossible to accomplish for a large number of real people. We show how such a longitudinal dataset can be constructed by leveraging the current state-of-the-art in facial re-aging that, although failing on real images, does provide photoreal re-aging results on synthetic faces. Our second key insight is then to leverage such synthetic data and formulate facial re-aging as a practical image-to-image translation task t hat can be performed by training a well-understood U-Net architecture, without the need for more complex network designs. We demonstrate how the simple U-Net, surprisingly, allows us to advance the state of the art for re-aging real faces on video, with unprecedented temporal stability and preservation of facial identity across variable expressions, viewpoints, and lighting conditions. Finally, our new face re-aging network (FRAN) incorporates simple and intuitive mechanisms that provides artists with localized control and creative freedom to direct and fine-tune the re-aging effect, a feature that is largely important in real production pipelines and often overlooked in related research work.

关键词： Facial re-aging image and video editing

来源：评论

学校读者我要写书评

暂无评论

Enhanced Monitoring and Security Measures with video Analysis

Enhanced Monitoring and Security Measures with Video Analysi...

引用

2025 International conference on Multi-Agent Systems for Collaborative Intelligence, ICMSCI 2025

作者： Jyothi, V. Esther Rakesh, G. Aruna, Vipparla Gorintla, Shobana Ayyappa, Yalanati Naidu, U. Ganesh Velagapudi Ramakrishna Siddhartha Engineering College Deemed to be University Department of Computer Applications A.P. Vijayawada India Department of Computer Science and Engineering Pothavarapadu A.P India Vel Tech Rangarajan Dr.Sagunthala R&d Institute of Science and Technology Department of Computer Science and Engineering Tamilnadu Avadi India B V Raju Institute of Technology Telangana Narsapur India

ISBN: (纸本)9798331509828

In modern surveillance, activities have increasingly become dependent on the continuous observation offered by CCTV systems. Still, with massive amounts of video data generated in a minute, sifting through this information manually to pick up any anomalies would be extremely burdensome and nearly impossible without significant human labor and vigilant watchfulness. CNN and CLIP models changed everything with regards to the working of surveillance systems. It makes use of CNN's capability for video frame processing and analysis to detect and classify these activities in the frames as either normal or suspicious. At the same time, it improves this by adding the CLIP model's capability of understanding textual descriptions and visual content together for better nuance detection in suspicious activities. This methodology transforms video surveillance by breaking video streams into frames and analyzing the behavior and interactions of persons in those frames. The synergy of CNN and CLIP models not only ensures real-time, efficient, and accurate surveillance but also minimizes dependency on extensive manual labor for monitoring. This paper aims to provide a contribution toward developing better security infrastructures in public spaces, transportation hubs, and private facilities. The introduction of this anomaly-detection mechanism can hugely improve the capability of implemented surveillance systems with a proactive response based on real-time detection and response capabilities against suspicious activities. © 2025 IEEE.

关键词： video analysis

来源：评论

学校读者我要写书评

暂无评论

A novel and optimized YOLO model for H.265 encoded video frames

引用

International Journal of Information Technology (Singapore) 2025年 1-9页

作者： Rajagopal, Balaji Ganesh Kannan, R. Jagadeesh Deebalakshmi, R. Raagav, R. R. Dharun School of Computing SRM Institute of Science and Technology Tiruchirappalli India

Recent advancements in object tracking and detection algorithms within video surveillance systems have significantly enhanced their ability to identify potential threats and anomalous activities. Addressing sustainability and efficiency challenges, this research introduces a novel (You only look once) YOLOv9-based object recognition model for a remotely operated surveillance rover. By optimizing the detection algorithm using TensorRT, the system achieves a remarkable 93.27% accuracy when trained on the Indian Driving Dataset (IDD), reducing image processing time by 20% and network latency by 15%. The proposed system integrates a hybrid solar-battery configuration, enabling energy-efficient operation across diverse surveillance scenarios. Utilizing High Efficiency video Coding (HEVC) compression, the rover ensures seamless video transmission while maintaining high visual quality. A user-friendly web interface provides global remote control and lives streaming capabilities, demonstrating the system’s adaptability in real-time object detection applications for augmented reality, robotics, autonomous vehicles, and surveillance systems. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2025.

关键词： Global accessibility HEVC compression Hybrid solar-battery system Object detection Remote operated surveillance rover TensorRT YOLOv9 model

来源：评论

学校读者我要写书评

暂无评论

Segmentation of kidney stones in endoscopic video feeds

Segmentation of kidney stones in endoscopic video feeds

引用

conference on Medical Imaging - image processing

作者： Stoebner, Zachary A. Lu, Daiwei Hong, Seok Hee Kavoussi, Nicholas L. Oguz, Ipek Vanderbilt Univ Nashville TN 37235 USA Vanderbilt Univ Med Ctr Nashville TN USA

ISBN: (数字)9781510649408

ISBN: (纸本)9781510649408;9781510649392

image segmentation has been increasingly applied in medical settings as recent developments have skyrocketed the potential applications of deep learning. Urology, specifically, is one field of medicine that is primed for the adoption of a real-time image segmentation system with the long-term aim of automating endoscopic stone treatment. In this project, we explored supervised deep learning models to annotate kidney stones in surgical endoscopic video feeds. In this paper, we describe how we built a dataset from the raw videos and how we developed a pipeline to automate as much of the process as possible. For the segmentation task, we adapted and analyzed three baseline deep learning models - U-Net, U-Net++, and DenseNet - to predict annotations on the frames of the endoscopic videos with the highest accuracy above 90%. To show clinical potential for real-time use, we also confirmed that our best trained model can accurately annotate new videos at 30 frames per second. Our results demonstrate that the proposed method justifies continued development and study of image segmentation to annotate ureteroscopic video feeds.

关键词： segmentation deep learning endoscopy computer vision

来源：评论

学校读者我要写书评

暂无评论

Multimodal Sentiment Analysis: Techniques, Implementations and Challenges across Diverse Modalities 18

Multimodal Sentiment Analysis: Techniques, Implementations a...

引用

18th INDIAcom;11th International conference on Computing for Sustainable Global Development, INDIACom 2024

作者： Shah, Pooja Ambekar, Dhruti Bodat, Harshil Kumari, Sudesh Pandit Deendayal Energy University Computer Science and Engineering Gandhinagar382007 India Bps Mahila Vishwavidalaya Department of Electronics and Communication Engineering Sonipat Khanpur Kalan India

ISBN: (纸本)9789380544519

Multimodal Sentiment Analysis has gained significant attention in Machine Learning. It's popular because it not only provides better results and a deeper understanding of context but also offers a valuable alternative to analyzing just one type of data. In recent years, there has been a growing focus on this area, with the emergence of new datasets and advanced models designed to handle the complexities of analyzing emotions from different types of data. This paper presents an innovative approach for real-time emotion analysis using diverse inputs: text, images, videos, and audio. We begin by establishing baseline models and curating datasets for thorough evaluation. Our research introduces three cutting-edge deep learning techniques - Convolutional Neural Networks (CNN), Visual Geometry Group Networks (VGG), and Recurrent Neural Networks (RNN) - to enhance sentiment analysis accuracy. Additionally, we integrate YOLOv5 for sophisticated image processing, Mel Frequency Cepstral Coefficients (MFCC)-based frameworks for audio analysis, and advanced Natural Language processing (NLP) models for text interpretation. Uniquely, we transform audio into text, allowing for dual-mode evaluation - as audio and text - using our NLP models. Our study critically examines factors often neglected in emotion analysis, like the impact of varying data sources and the system's performance in different scenarios. The results of our research significantly deviate from existing methodologies by offering a more holistic and versatile framework. Our framework not only sets a standard for future research but also emphasizes the essential considerations needed for effective real-time emotion analysis across various modes. © 2024 Bharati Vidyapeeth, New Delhi.

关键词： Audio Benchmark Dataset Division Deep Learning Architectures Emotion Analysis Evaluation Framework Generalizability image Innovation MFCC-based Models Modality Significance Multimodal Sentiment Analysis NLP Models real-time Text video YOLOv5

来源：评论

学校读者我要写书评

暂无评论

Lightweight real-time Semantic Segmentation Network With Efficient Transformer and CNN

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2023年第12期24卷 15897-15906页

作者： Xu, Guoan Li, Juncheng Gao, Guangwei Lu, Huimin Yang, Jian Yue, Dong Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Peoples R China Soochow Univ Prov Key Lab Comp Informat Proc Technol Suzhou 215006 Peoples R China Shanghai Univ Sch Commun & Informat Engn Shanghai Peoples R China Nanjing Univ Sci & Technol Jiangsu Key Lab Image & Video Understanding Social Nanjing 210049 Peoples R China Kyushu Inst Technol Kitakyushu 8048550 Japan Nanjing Univ Sci & Technol Sch Comp Sci & Technol Nanjing 210049 Peoples R China

In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which results in suboptimal results. Recently, Transformer achieved huge success in NLP tasks, demonstrating its advantages in modeling long-range dependency. Recently, Transformer has also attracted tremendous attention from computer vision researchers who reformulate the image processing tasks as a sequence-to-sequence prediction but resulted in deteriorating local feature details. In this work, we propose a lightweight real-time semantic segmentation network called LETNet. LETNet combines a U-shaped CNN with Transformer effectively in a capsule embedding style to compensate for respective deficiencies. Meanwhile, the elaborately designed Lightweight Dilated Bottleneck (LDB) module and Feature Enhancement (FE) module cultivate a positive impact on training from scratch simultaneously. Extensive experiments performed on challenging datasets demonstrate that LETNet achieves superior performances in accuracy and efficiency balance. Specifically, It only contains 0.95M parameters and 13.6G FLOPs but yields 72.8% mIoU at 120 FPS on the Cityscapes test set and 70.5% mIoU at 250 FPS on the CamVid test dataset using a single RTX 3090 GPU. Source code will be available at https://***/IVIPLab/LETNet.

关键词： Transformers Convolution Semantic segmentation Task analysis Convolutional neural networks Computational modeling Semantics real-time semantic segmentation convolutional neural network lightweight network transformer

来源：评论

学校读者我要写书评

暂无评论

real-time target detection method of UAV aerial image combined with X-ray digital imaging technology 5

Real-time target detection method of UAV aerial image combin...

引用

5th International conference on Computer Vision, image and Deep Learning, CVIDL 2024

作者： Chen, Hui Wang, Fangfang Tan, Xinghua Yu, Long Li, Zhenghui Xinjiang Power Transmission and Transformation Co. Ltd Xinjiang Urumqi830000 China Henan Star Detection Technology Co. Ltd Henan Xuchang461000 China

ISBN: (纸本)9798350373820

Due to the influence of weather, illumination and other factors, the image quality of UAV aerial images is not good, so a real-time target detection method of UAV aerial images combined with X-ray digital imaging technology is proposed. Following the attenuation law, the intensity of the target detection and registration search area is described by the quality degradation coefficient, the sampling points of the aerial image of the UAV are configured, and the target detection and registration search area is delineated by combining the X -ray digital imaging technology;On this basis, the high-definition images taken by UAV are analyzed in real time, and the real-time target marking points of UAV aerial images are identified. The X-ray digital imaging technology is applied to the images, and the effective area is extracted and the calculated actual value is output as the output result of real-time target detection of UAV aerial images, and the defogging processing result of UAV aerial images is obtained. The experimental results show that this method improves the definition of aerial images, effectively removes the interference information such as clouds in aerial images, and has high detection accuracy and distortion correction on targets such as trees, roads, houses and farmland, and has good detection performance. © 2024 IEEE.

关键词： Drones

来源：评论

学校读者我要写书评

暂无评论

video Surveillance Authentication: real-time ENF Signal Hiding at the Edge 24

Video Surveillance Authentication: Real-Time ENF Signal Hidi...

引用

24th International conference on Digital Signal processing, DSP 2023

作者： Lykourinas, Antonios Skodras, Athanassios Department of Electrical and Computer Engineering Faculty of Engineering University of Patras Patras Greece

ISBN: (纸本)9798350339598

Due to the significance of the visual information exchanged in Internet of video Things (IoVT) networks, attackers are constantly launching new attacks and attempt to exploit new vulnerabilities. One of the most common and difficult-to-prevent attacks on the Visual Layer is the Frame Duplication Attack (FDA). Recently, two techniques were proposed for FDA detection at the edge by using the embedded Electrical Network Frequency (ENF) signals in an effort to surpass limitations of conventional passive methods. In this paper, a real-time ENF signal hiding technique at the edge is proposed. Our motivation is to examine the possibility of authenticating the surveillance feed by hiding the ENF signal. Experiments are conducted, including an extensive performance comparison between the proposed and reference encoder, a feasibility study for the proposed encoder's integration to a Raspberry Pi for video streaming purposes and finally the implementation of a proof-of-concept prototype. According to the findings, the proposed approach provides real-time FDA detection at reduced computational complexity and hardware requirements, thus rendering this method appropriate for applications at the edge. © 2023 IEEE.

关键词： Security systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：