检索结果-内蒙古大学图书馆

End-to-end object detection method based on shape correction and feature selection cross-attention

Journal of Electronic Imaging 2025年第2期34卷

作者： He, Lixin Ge, Luqing Cheng, Zhi Wang, Xiaofeng Yao, Guangzhuang Hu, Zhi Hefei University School of Artificial Intelligence and Big Data Collaborative Innovation Center for Computer Vision and Pattern Recognition Hefei China

DEtection TRansformer (DETR) and its variant models degrade the object detection performance due to the inability to provide the object position a priori and the lack of shape deviation supervision between prediction boxes and ground truth boxes, which makes some of the shape differences between prediction boxes and ground truth boxes too large. To address this problem, we propose an end-to-end object detection method (SF-DETR) based on shape correction and feature selection cross-attention. First, a feature selection layer is introduced to provide the model with a priori object position by combining classification confidence and intersection over union score to filter high-quality prediction boxes and constrain the computational range of decoder cross-attention. Second, the shape bias is numericalized and a shape correction loss function is proposed to supervise the shape bias and ensure that the shape bias is effectively corrected. Finally, we conducted experiments on three public datasets, MS COCO, Caltech Pedestrians, and BIT-Vehicle, and the experimental results show that the method in this paper not only provides the object position a priori effectively but also reduces the shape bias significantly, which improves the prediction box quality and object detection accuracy. © 2025 SPIE and IS&T.

关键词： cross-attention DEtection TRansformer end-to-end object detection shape correction

来源：评论

学校读者我要写书评

暂无评论

CodePhys: Robust Video-Based Remote Physiological Measurement Through Latent Codebook Querying

引用

IEEE Journal of Biomedical and Health Informatics 2025年 PP卷 PP页

作者： Chu, Shuyang Xia, Menghan Yuan, Mengyao Liu, Xin Seppanen, Tapio Zhao, Guoying Shi, Jingang Xi'an Jiaotong University School of Software Engineering Xi'an China Tencent Ai Lab Shenzhen China Lappeenranta-Lahti University of Technology Lut Computer Vision and Pattern Recognition Laboratory Lappeenranta53850 Finland University of Oulu Center for Machine Vision and Signal Analysis Finland

Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of rPPG signal faces intractable challenges when interference from real-world scenarios takes place on facial video. Specifically, facial videos are inevitably affected by non-physiological factors (e.g., camera device noise, defocus, and motion blur), leading to the distortion of extracted rPPG signals. Recent rPPG extraction methods are easily affected by interference and degradation, resulting in noisy rPPG signals. In this paper, we propose a novel method named CodePhys, which innovatively treats rPPG measurement as a code query task in a noise-free proxy space (i.e., codebook) constructed by ground-truth PPG signals. We consider noisy rPPG features as queries and generate high-fidelity rPPG features by matching them with noise-free PPG features from the codebook. Our approach also incorporates a spatial-aware encoder network with a spatial attention mechanism to highlight physiologically active areas and uses a distillation loss to reduce the influence of non-periodic visual interference. Experimental results on four benchmark datasets demonstrate that CodePhys outperforms state-of-the-art methods in both intra-dataset and cross-dataset settings. © 2025 IEEE.

关键词： Heart

来源：评论

学校读者我要写书评

暂无评论

CodePhys: Robust Video-based Remote Physiological Measurement through Latent Codebook Querying

arXiv

引用

arXiv 2025年

作者： Chu, Shuyang Xia, Menghan Yuan, Mengyao Liu, Xin Seppanen, Tapio Zhao, Guoying Shi, Jingang The School of Software Engineering Xi’an Jiaotong University Xi’an China The Tencent AI Lab Shenzhen China The Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta53850 Finland The Center for Machine Vision and Signal Analysis University of Oulu Finland

关键词： Heart

来源：评论

学校读者我要写书评

暂无评论

Using electromagnetic input for multi-user or two-handed spatial gestural interaction based on the digital compass 15

Using electromagnetic input for multi-user or two-handed spa...

引用

17th International Conference on Human-computer Interaction with Mobile Devices and Services, MobileHCI 2015

作者： Yuksel, Kamer Ali Baz, Ipek Ozduman, Haluk Computer Vision and Pattern Recognition Laboratory Sabanci University Turkey Research Center for ICT German Turkish Advanced Technical University of Berlin Germany

ISBN: (纸本)9781450336529

Multiple researchers recently proposed the use of the digital compass embedded in mobile devices for touchless interaction in the 3D space around them. These methods overcome several limits imposed by other interaction techniques and were evaluated for a variety of uses. However, they do not support collaborative settings and are prone to dynamic noise caused by external conditions, as with most other sensor-based interaction techniques. In this paper, we propose the use of frequency-modulated electromagnets as an input medium for magnetic interaction to overcome its various constraints and further enable multi-user and two-handed input. Furthermore, we demonstrated the hardware design specifications of a novel input device, referred to as electromagnetic stylus, which is prototyped to conduct a user-study on the proposed method. Experimental results indicate that gestures performed simultaneously by four electromagnetic styli can accurately be recognized using a single magnetic field sensor, and dynamic noises can be substantially reduced. © 2015 ACM.

关键词： Timing circuits

来源：评论

学校读者我要写书评

暂无评论

A new method for handwritten scene text detection in video

A new method for handwritten scene text detection in video

引用

International Conference on Frontiers in Handwriting recognition

作者： Shivakumara, Palaiahnakote Dutta, Anjan Pal, Umapada Tan, Chew Lim School of Computing National University of Singapore Singapore Singapore Computer Vision Center Universitat Autònoma de Barcelona Barcelona Spain Computer Vision and Pattern Recognition Unit Indian Statistical Institute India

ISBN: (纸本)9780769542218

There are many video images where hand written text may appear. Therefore handwritten scene text detection in video is essential and useful for many applications for efficient indexing, retrieval etc. Also there are many video frames where text line may be multi-oriented in nature. To the best of our knowledge there is no work on handwritten text detection in video, which is multi-oriented in nature. In this paper, we present a new method based on maximum color difference and boundary growing method for detection of multi-oriented handwritten scene text in video. The method computes maximum color difference for the average of R, G and B channels of the original frame to enhance the text information. The output of maximum color difference is fed to a K-means algorithm with K=2 to separate text and non-text clusters. Text candidates are obtained by intersecting the text cluster with the Sobel output of the original frame. To tackle the fundamental problem of different orientations and skews of handwritten text, boundary growing method based on a nearest neighbor concept is employed. We evaluate the proposed method by testing on our own handwritten text database and publicly available video data (Hua's data). Experimental results obtained from the proposed method are promising. © 2010 IEEE.

关键词： Color

来源：评论

学校读者我要写书评

暂无评论

Non-deterministic Behavior of Ranking-Based Metrics When Evaluating Embeddings 2nd

Non-deterministic Behavior of Ranking-Based Metrics When Eva...

引用

2nd International Workshop on Reproducible Research in pattern recognition, RRPR 2018

作者： Nicolaou, Anguelos Dey, Sounak Christlein, Vincent Maier, Andreas Karatzas, Dimosthenis Computer Vision Center Edificio O Campus UAB Bellaterra08193 Spain Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany

ISBN: (纸本)9783030239862

Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits. © Springer Nature Switzerland AG 2019.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

A Holistic Approach for recognition of Complete Urdu Ligatures Using Hidden Markov Models

A Holistic Approach for Recognition of Complete Urdu Ligatur...

引用

Frontiers of Information Technology (FIT)

作者： Israr Uddin Imran Siddiqi Shehzad Khalid Center of Computer Vision and Pattern Recognition Bahria University Islamabad Pakistan

Optical Character recognition (OCR) is one of the continuously explored problems. Presently, commercial character recognizers are available reporting near to 100% recognition rates on text in a number of scripts. Despite these advancements, OCR systems however, have yet to mature for cursive scripts like Urdu. This study presents a holistic technique for recognition of Urdu text in Nastaliq font using "complete" ligatures as recognition units. The term "complete" refers to a partial word including its main body and secondary components (dots and diacritic marks). Discrete Wavelet Transform (DWT) is employed as feature extractor while a separate Hidden Markov Model (HMM) is trained for each ligature considered in our study. More than 2000 frequently used unique Urdu ligatures from the standard CLE (center of Language Engineering) dataset are considered in our evaluations. The system reads a promising accuracy of 88.87% on more than 10,000 partial words.

关键词： Character recognition Feature extraction Hidden Markov models Text recognition Discrete wavelet transforms Training Optical character recognition software

来源：评论

学校读者我要写书评

暂无评论

Convex hull based approach for multi-oriented character recognition from graphical documents

Convex hull based approach for multi-oriented character reco...

引用

作者： Roy, Partha Pratim Pal, Umapada Lladós, Josep Kimura, Fumitaka Computer Vision Center Universitat Autònoma De Barcelona 08193 Bellaterra Spain Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata - 108 India Graduate School of Engineering Mie University 1577 Kurimamachiya Mie 514-8504 Japan

ISBN: (纸本)9781424421756

In this paper, we present a scheme towards recognition of English character in multi-scale and multi-oriented environments. Graphical document such as map consists of text lines which appear in different orientation. Sometimes, characters in a single word may follow a curvi-linear way to annotate the graphical curve lines. For recognition of such multi-scale and multi-oriented characters a Support Vector Machine (SVM) based scheme is presented in this paper. The feature used here is invariant to character orientation. Circular ring and convex hull have been used along with angular information of the contour pixels of the character to make the feature rotation invariant. We tested our proposed scheme on two different datasets. Combining circular and convex hull feature we have obtained 96.73% and 99.56% accuracy in these two datasets. © 2008 IEEE.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Evaluation of diffusion techniques for improved vessel visualization and quantification in three-dimensional rotational angiography 4th

引用

4th International Conference on Medical Image Computing and computer-Assisted Intervention, MICCAI 2001

作者： Meijering, Erik Niessen, Wiro Weickert, Joachim Viergever, Max Image Sciences Institute University Medical Center Utrecht Heidelberglaan 100 UtrechtNL-3584 CX Netherlands Computer Vision Graphics and Pattern Recognition Group Department of Mathematics and Computer Science University of Mannheim MannheimD-68131 Germany

ISBN: (纸本)3540426973

Three-dimensional rotational angiography (3DRA) is a promising imaging technique which yields high-resolution isotropic 3D images of vascular structures. Raw 3DRA images, however, usually suffer from a high noise level and the presence of other artifacts. For accurate visualization and quantification of vascular anomalies, noise reduction is therefore highly desirable. In this paper we analyze the effects of several linear and nonlinear filtering techniques for that purpose. From the results of in vitro experiments we conclude that edge-enhancing anisotropic diffusion is very suitable for mentioned tasks. However, in view of the computational requirements of this technique, the regularized isotropic nonlinear diffusion scheme may be considered a useful alternative. © Springer-Verlag Berlin Heidelberg 2001.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

3-D non-rigid motion estimation from image sequence based on Makov random field [Makov read Markov]

3-D non-rigid motion estimation from image sequence based on...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Ya-Ming Wang Wen-Qing Huang Kai Zheng Research Center for Computer Vision and Pattern Recognition Zhejiang University of Science and Technology Hangzhou China

ISBN: (纸本)0780384032

We propose an approach to 3-D non-rigid motion estimation from image sequence in this paper. First, with the establishment of feature point correspondence between consecutive image frames, the affine motion model and the central projection model are presented for local non-rigid motion. Then, in order to obtain the global motion parameters and overcome the ill-posed 3-D estimation problem, a framework of Markov random field (MRF) is proposed. By incorporating the motion prior constrains into the MRF, the motion smoothness feature between local regions is reflected. This converts the ill-posed problem into a well-posed one and guarantees a robust solution. Experimental results from a sequence of synthetic image sequence demonstrate the feasibility of the proposed approach.

关键词： Motion estimation Image sequences Shape Deformable models computer vision Parametric statistics Robustness pattern recognition Markov random fields Image converters

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：