检索结果-内蒙古大学图书馆

AMST: Object tracking based on collaborative framework with adaptive multi-strategy

Information Sciences 2025年 718卷

作者： Rui Xu Si Chen Yan Yan Da-Han Wang Shunzhi Zhu Fujian Key Laboratory of Pattern Recognition and Image Understanding School of Computer and Information Engineering Xiamen University of Technology Xiamen 361024 China Fujian Key Laboratory of Sensing and Computing for Smart City School of Informatics Xiamen University Xiamen 361005 China

Deep object tracking can be modeled by online learning to adapt to the appearance changes, or offline learning to achieve fast tracking speed. However, either online or offline learning trackers are still difficult to continuously cope with challenging tracking scenes, due to the cumulative errors easily caused by online learning trackers and the object appearance changes often neglected by offline learning trackers. To overcome this problem, we propose a novel object tracking method based on collaborative framework with adaptive multi-strategy, enabling automatic switch between the online and offline learning trackers. In this framework, we first design the variable action set (VAS) module to adaptively choose the different trackers, tracking strategies, and template strategies with multiple action labels. Moreover, we incorporate the template update and selection (TUS) module, which dynamically updates and selects templates from a memory unit to adapt to object appearance changes. To update the action set and the memory unit, we further devise the online reliability evaluation (ORE) module that not only evaluates the tracking result and the tracker itself, but also estimates the quality and quantity of templates. Comprehensive experiments on challenging short-term and long-term tracking benchmarks demonstrate the remarkable performance of the proposed method.

关键词： Adaptive multi-strategy Collaborative framework Offline learning tracker Online learning tracker Template update and selection

来源：评论

学校读者我要写书评

暂无评论

Attention-Guided Multi-scale Interaction Network for Face Super-Resolution

arXiv

引用

arXiv 2024年

作者： Wan, Xujie Li, Wenjie Gao, Guangwei Lu, Huimin Yang, Jian Lin, Chia-Wen The Institute of Advanced Technology Nanjing University of Posts and Telecommunications Nanjing210046 China Key Laboratory of Artificial Intelligence Ministry of Education Shanghai200240 China The Provincial Key Laboratory for Computer Information Processing Technology Soochow University Suzhou215006 China The Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing100080 China The School of Automation Southeast University Nanjing210096 China The School of Computer Science and Technology Nanjing University of Science and Technology Nanjing210094 China The Department of Electrical Engineering National Tsing Hua University Hsinchu30013 Taiwan

Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks. Since numerous features at different scales in hybrid networks, how to fuse these multi-scale features and promote their complementarity is crucial for enhancing FSR. However, existing hybrid network-based FSR methods ignore this, only simply combining the Transformer and CNN. To address this issue, we propose an attention-guided Multi-scale interaction network (AMINet), which contains local and global feature interactions as well as encoder-decoder phases feature interactions. Specifically, we propose a Local and Global Feature Interaction Module (LGFI) to promote fusions of global features and different receptive fields' local features extracted by our Residual Depth Feature Extraction Module (RDFE). Additionally, we propose a Selective Kernel Attention Fusion Module (SKAF) to adaptively select fusions of different features within LGFI and encoder-decoder phases. Our above design allows the free flow of multi-scale features from within modules and between encoder and decoder, which can promote the complementarity of different scale features to enhance FSR. Comprehensive experiments confirm that our method consistently performs well with less computational consumption and faster inference. Copyright © 2024, The Authors. All rights reserved.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

arXiv

引用

arXiv 2023年

作者： Li, Wenjie Wang, Mei Zhang, Kai Li, Juncheng Li, Xiaoming Zhang, Yuhang Gao, Guangwei Deng, Weihong Lin, Chia-Wen The Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China The Computer Vision Lab ETH Zürich Zürich Switzerland The School of Communication and Information Engineering Shanghai University Shanghai China The Nanyang Technological University Singapore The Intelligent Visual Information Perception Laboratory Institute of Advanced Technology Nanjing University of Posts and Telecommunications Nanjing China The Department of Electrical Engineering National Tsing Hua University Hsinchu Taiwan

Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize LQ images. We also discuss notable benchmarks commonly utilized in the field. Next, we categorize FR methods based on different tasks and explain their evolution over time. Furthermore, we explore the various facial priors commonly utilized in the restoration process and discuss strategies to enhance their effectiveness. In the experimental section, we thoroughly evaluate the performance of state-of-the-art FR methods across various tasks using a unified benchmark. We analyze their performance from different perspectives. Finally, we discuss the challenges faced in the field of FR and propose potential directions for future advancements. The open-source repository corresponding to this work can be found at https://***/24wenjie-li/Awesome-Face-Restoration. Copyright © 2023, The Authors. All rights reserved.

关键词： Restoration

来源：评论

学校读者我要写书评

暂无评论

Open source simulation of fixational eye drift motion in oct scans: Towards better comparability and accuracy in retrospective OCT motion correction

Open source simulation of fixational eye drift motion in oct...

引用

International workshop on Algorithmen - Systeme - Anwendungen, 2020

作者： Nau, Merlin A. Ploner, Stefan B. Moult, Eric M. Fujimoto, James G. Maier, Andreas K. Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany Department for Electrical Engineering and Computer Science and Research Laboratory for Electronics Massachusetts Institute of Technology Cambridge United States

ISBN: (纸本)9783658292669

Point-wise scanning modalities like Optical Coherence Tomography (OCT) or Scanning Laser Ophthalmoscopy suffer from distortions due to the perpetual motion of the eye. While various motion correction approaches have been proposed, the absence of ground truth displacements or images entails a lack of accurate and comparable evaluations. The purpose of this paper is to close this gap by initiating an open source framework for the simulation of realistic eye motion and corresponding artificial distortion of scans, thereby for the first time enabling the community to a) create datasets with accessible ground truth and b) compare the correction of identical motion patterns in data acquired with different scanners or scan patterns. This paper extends previous work on simulation of fixational eye drift via a self-avoiding random walk in a potential to a continuous domain in time and space, allowing the derivation of smooth displacement fields. The model is demonstrated by presenting an examplary motion path, whose properties resemble reported properties of recordings in current literature on fixational eye motion. Furthermore, the artificial distortion of scans is demonstrated by showing a correspondingly distorted image of a virtual raster scan modeled according to the properties of an existing OCT scanner. All experiments can be reproduced and adapted to arbitrary scanner- and raster scan pattern-properties in the publicly available framework. Beyond that, the open source code provides a starting point for the community to integrate extensions like saccadic or axial eye motion. © Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature 2020.

关键词： Optical tomography

来源：评论

学校读者我要写书评

暂无评论

Structural Topology Refinement Network for Skeleton-Based Action recognition

引用

IEEE Transactions on Instrumentation and Measurement 2025年 74卷

作者： Wang, Rui Jin, Jiayao Chen, Ziheng Wu, Cong Wu, Xiao-Jun Sebe, Nicu Jiangnan University School of Artificial Intelligence and Computer Science Wuxi214122 China University of Trento Department of Information Engineering and Computer Science Trento38123 Italy Jiangnan University Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence China

The effectiveness of Graph Convolutional Networks (GCNs) has been widely demonstrated in skeleton-based action recognition. However, most existing GCN-based methods use a dense adjacency matrix to describe the structural information of the entire skeleton, i.e., a holistic representation, which neglects the discriminability of local patterns. To address this challenge, we propose a novel region descriptor by dividing the skeleton into different local sections (i.e., left arm, right arm, left leg, right leg, torso, and head). The generated representations contain rich semantic information, enabling the model to better understand the action correlation between different body parts. Inspired by the success of manifold learning in nonlinear data characterization, the Symmetric Positive Definite (SPD) matrix and Riemannian neural network are further introduced to capture the long-range statistical relationships among different topographies. These components form our Structural Topology Refinement Network (STRN). Extensive experiments on three benchmark datasets, namely NTU-60, NTU-120, and NW-UCLA, show the superiority of our proposed method over the state-of-the-art. © 1963-2012 IEEE.

关键词： Musculoskeletal system

来源：评论

学校读者我要写书评

暂无评论

Vision Transformer for Parkinson’s Disease Classification using Multilingual Sustained Vowel Recordings

Vision Transformer for Parkinson’s Disease Classification u...

引用

Annual International Conference of the IEEE engineering in Medicine and Biology Society (EMBC)

作者： Daria Hemmerling Marek Wodzinski Juan Rafael Orozco-Arroyave David Sztaho Mateusz Daniol Pawel Jemiolo Magdalena Wojcik-Pedziwiatr Daria Hemmerling Marek Wodzinski Mateusz Daniol and Pawel Jemiolo are With the AGH University of Science and Technology Faculty of Electrical Engineering Automatics Computer Science and Biomedical Engineering Krakow Poland Information Systems Institute University of Applied Sciences Western Switzerland (HES-SO Valais) Sierre Switzerland Universidad de Antioquia Medellin Colombia and With the Pattern Recognition Lab at the University of Erlangen Erlangen Germany Department of Telecommunications and Media Informatics Budapest University of Technology and Economics Hungary Department of Neurology Andrzej Frycz Modrzewski Krakow University Krakow Poland

Parkinson’s disease (PD) is the 2 nd most prevalent neurodegenerative disease in the world. Thus, the early detection of PD has recently been the subject of several scientific and commercial studies. In this paper, we propose a pipeline using Vision Transformer applied to mel-spectrograms for PD classification using multilingual sustained vowel recordings. Furthermore, our proposed transformed-based model shows a great potential to use voice as a single modality biomarker for automatic PD detection without language restrictions, a wide range of vowels, with an F1-score equal to 0.78. The results of our study fall within the range of the estimated prevalence of voice and speech disorders in Parkinson’s disease, which ranges from 70-90%. Our study demonstrates a high potential for adaptation in clinical decision-making, allowing for increasingly systematic and fast diagnosis of PD with the potential for use in *** relevance— There is an urgent need to develop non invasive biomarker of Parkinson’s disease effective enough to detect the onset of the disease to introduce neuroprotective treatment at the earliest stage possible and to follow the results of that intervention. Voice disorders in PD are very frequent and are expected to be utilized as an early diagnostic biomarker. The voice analysis using deep neural networks open new opportunities to assess neurodegenerative diseases’ symptoms, for fast diagnosis-making, to guide treatment initiation, and risk prediction. The detection accuracy for voice biomarkers according to our method reached close to the maximum achievable value.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Editorial

引用

International Journal of Bio-Inspired Computation 2020年第2期16卷 67-67页

作者： Fang, Wei Wu, Xiaojun Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence Department of Computer Science and Technology Jiangnan University 214122 China

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2023 Image Shadow Removal Challenge Report

NTIRE 2023 Image Shadow Removal Challenge Report

引用

2023 IEEE/CVF Conference on computer Vision and pattern recognition Workshops, CVPRW 2023

作者： Vasluianu, Florin-Alexandru Seizinger, Tim Timofte, Radu Cui, Shuhao Huang, Junshi Tian, Shuman Fan, Mingyuan Zhang, Jiaqi Zhu, Li Wei, Xiaoming Wei, Xiaolin Luo, Ziwei Gustafsson, Fredrik K. Zhao, Zheng Sjölund, Jens Schön, Thomas B. Dong, Xiaoyi Zhang, Xi Sheryl Li, Chenghua Leng, Cong Yeo, Woon-Ha Oh, Wang-Taek Lee, Yeo-Reum Ryu, Han-Cheol Luo, Jinting Jiang, Chengzhi Han, Mingyan Wu, Qi Lin, Wenjie Yu, Lei Li, Xinpeng Jiang, Ting Fan, Haoqiang Liu, Shuaicheng Xu, Shuning Song, Binbin Chen, Xiangyu Zhang, Shile Zhou, Jiantao Zhang, Zhao Zhao, Suiyi Zheng, Huan Gao, Yangcheng Wei, Yanyan Wang, Bo Ren, Jiahuan Luo, Yan Kondo, Yuki Miyata, Riku Yasue, Fuma Naruki, Taito Ukita, Norimichi Chang, Hua-En Yang, Hao-Hsiang Chen, Yi-Chung Chiang, Yuan-Chun Huang, Zhi-Kai Chen, Wei-Ting Chen, I-Hsiang Hsieh, Chia-Hsuan Kuo, Sy-Yen Xianwei, Li Fu, Huiyuan Liu, Chunlin Ma, Huadong Fu, Binglan He, Huiming Wang, Mengjia She, Wenxuan Liu, Yu Nathan, Sabari Kansal, Priya Zhang, Zhongjian Yang, Huabin Wang, Yan Zhang, Yanru Phutke, Shruti S. Kulkarni, Ashutosh Khan, Md Raqib Murala, Subrahmanyam Vipparthi, Santosh Kumar Ye, Heng Liu, Zixi Yang, Xingyi Liu, Songhua Wu, Yinwei Jing, Yongcheng Yu, Qianhao Zheng, Naishan Huang, Jie Long, Yuhang Yao, Mingde Zhao, Feng Zhao, Bowen Ye, Nan Shen, Ning Cao, Yanpeng Xiong, Tong Xia, Weiran Li, Dingwen Xia, Shuchen Computer Vision Lab Ifi Caidas University of Würzburg Germany Computer Vision Lab Eth Zürich Switzerland Meituan Group China Department of Information Technology Uppsala University Sweden Institute of Automation Chinese Academy of Sciences Beijing China Nanjing China Maicro Nanjing China Department of Artificial Intelligence Convergence Sahmyook University Seoul Korea Republic of Megvii Technology China University of Electronic Science and Technology of China China University of Macau China China Toyota Technological Institute Japan Graduate Institute of Electronics Engineering National Taiwan University Taiwan Department of Electrical Engineering National Taiwan University Taiwan Graduate Institute of Communication Engineering National Taiwan University Taiwan ServiceNow United States Beijing University of Post and Teleconmunication Beijing China Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education China Couger Inc. Computer Vision and Pattern Recognition Lab Indian Institute of Technology Ropar Punjab Rupnagar India Research Institute Singapore National University of Singapore Singapore Research Institute Singapore University of Sydney Australia Brain-Inspired Vision Laboratory Information Science and Technology Institution University of Science and Technology of China China State Key Laboratory of Fluid Power and Mechatronic Systems School of Mechanical Engineering Zhejiang University Hangzhou310027 China Key Laboratory of Advanced Manufacturing Technology of Zhejiang Province School of Mechanical Engineering Zhejiang University Hangzhou310027 China South China University of Technology China

ISBN: (纸本)9798350302493

This work reviews the results of the NTIRE 2023 Challenge on Image Shadow Removal. The described set of solutions were proposed for a novel dataset, which captures a wide range of object-light interactions. It consists of 1200 roughly pixel aligned pairs of real shadow free and shadow affected images, captured in a controlled environment. The data was captured in a white-box setup, using professional equipment for lights and data acquisition sensors. The challenge had a number of 144 participants registered, out of which 19 teams were compared in the final ranking. The proposed solutions extend the work on shadow removal, improving over the performance level describing state-of-the-art methods. © 2023 IEEE.

关键词： Data acquisition

来源：评论

学校读者我要写书评

暂无评论

A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI

arXiv

引用

arXiv 2025年

作者： Pérez-Toro, Paula Andrea Arias-Vergara, Tomás Xing, Fangxu Liu, Xiaofeng Stone, Maureen Zhuo, Jiachen Orozco-Arroyave, Juan Rafael Nöth, Elmar Hutter, Jana Prince, Jerry L. Maier, Andreas Woo, Jonghye Harvard Medical School Massachusetts General Hospital BostonMA02114 United States Pattern Recognition Lab Department of Computer Science Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Bayern 91058 Germany GITA Lab Faculty of Engineering Universidad de Antioquia Antioquia Medellín050010 Colombia Department of Radiology & Biomedical Imaging and Biomedical Informatics & Data Science Yale University New HeavenCT06510 United States Department of Neural and Pain Sciences Department of Orthodontics and Pediatrics University of Maryland School of Dentistry BaltimoreMD21210 United States Department of Orthodontics and Pediatrics University of Maryland School of Dentistry BaltimoreMD21201 United States Smart Imaging Lab Radiological Institute University Hospital Erlangen Bayern Erlangen91052 Germany Department of Electrical and Computer Engineering Johns Hopkins University BaltimoreMD21218 United States

Understanding the relationship between vocal tract motion during speech and the resulting acoustic signal is crucial for aided clinical assessment and developing personalized treatment and rehabilitation strategies. Toward this goal, we introduce an audio-to-video generation framework for creating Real Time/cine-Magnetic Resonance Imaging (RT-/cine-MRI) visuals of the vocal tract from speech signals. Our framework first preprocesses RT-/cine-MRI sequences and speech samples to achieve temporal alignment, ensuring synchronization between visual and audio data. We then employ a modified stable diffusion model, integrating structural and temporal blocks, to effectively capture movement characteristics and temporal dynamics in the synchronized data. This process enables the generation of MRI sequences from new speech inputs, improving the conversion of audio into visual data. We evaluated our framework on healthy controls and tongue cancer patients by analyzing and comparing the vocal tract movements in synthesized videos. Our framework demonstrated adaptability to new speech inputs and effective generalization. In addition, positive human evaluations confirmed its effectiveness, with realistic and accurate visualizations, suggesting its potential for outpatient therapy and personalized simulation of vocal tract visualizations. © 2025, CC BY-NC-ND.

关键词： Visualization

来源：评论

学校读者我要写书评

暂无评论

Photonic non-Bloch quadrupole topological insulators in coupled ring resonators

引用

Physical Review A 2021年第6期103卷 063507-063507页

作者： Zekun Lin Lu Ding Shuyue Chen Shan Li Shaolin Ke Xun Li Bing Wang Hubei Key Laboratory of Optical Information and Pattern Recognition Wuhan Institute of Technology Wuhan 430205 China Wuhan National Laboratory for Optoelectronics School of Physics Huazhong University of Science and Technology Wuhan 430074 China Department of Electrical and Computer Engineering McMaster University Hamilton Ontario L8S 4K2 Canada

We investigate the second-order topological phases in a two-dimensional ring resonator array with each plaquette occupied by π gauge flux and imaginary gauge field. The real and imaginary gauge fields are induced by shifting the displacement and integrating gain or loss into the two half perimeters of the auxiliary rings. The system supports topological corner modes with their emergence being determined by the non-Bloch topological invariant due to skin effects. The bulk modes, exhibiting second-order skin effects in both trivial and nontrivial phases, are accumulated at opposite corners depending on whether clockwise or counterclockwise modes are excited. By introducing an interface with different imaginary gauge fields, we show the bulk modes exist at the interface while the topological corner modes are localized at the physical corners. Furthermore, the skin effects are also presented in the passive ring resonators. The study may find applications in lasers and broadband light trapping.

关键词： Topological effects in photonic systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：