检索结果-内蒙古大学图书馆

International Conference on Awareness Science and Technology (iCAST)

作者： Fengzhi Wang Qinzhi Lv Lijuan Liu College of Computer and Information Engineering Xiamen University of Technology Xiamen China Fujian Key Laboratory of Pattern Recognition and Image Understanding

Passenger flow prediction is vitally significant for intelligent transportation systems (ITS). Most of the studies typically focus on the passenger flow prediction for an individual station, and only capture the temporal features without considering any spatial features. Constructing a passenger flow prediction model for multiple stations, or even a whole network, is more valuable for practical applications. Therefore, we develop a dynamic spatio-temporal network (DSTNet) with a self-attention (SA) mechanism for multi-station passenger flow prediction. A dynamic graph convolutional network (DGCN) is applied for the spatial feature extraction, and gated recurrent unit (GRU) is combined to learn the temporal features. SA is applied to further assign the weights for the extracted spatio-temporal features. The Experiment has been conducted on the passenger flow in the Xiamen bus rapid transit (BRT). The results demonstrate that the proposed DSTNet with SA (SA-DSTNet) outperforms the baselines in the multi-station passenger flow prediction task.

关键词：

来源：评论

学校读者我要写书评

暂无评论

EDEN: Deep Feature Distribution Pooling for Saimaa Ringed Seals pattern Matching 2nd

EDEN: Deep Feature Distribution Pooling for Saimaa Ringed Se...

引用

2nd International Conference on Cyber-Physical Systems and Control, CPS and C 2021

作者： Chelak, Ilia Nepovinnykh, Ekaterina Eerola, Tuomas Kälviäinen, Heikki Belykh, Igor Peter the Great St. Petersburg Polytechnic University Politechnicheskaya St. 29 St. Petersburg195251 Russia Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering School of Engineering Science Lappeenranta-Lahti University of Technology LUT P.O. Box 20 Lappeenranta53850 Finland

ISBN: (纸本)9783031208744

In this paper, pelage pattern matching is considered to solve the individual re-identification of the Saimaa ringed seals. Animal reidentification, together with the access to a large amount of image material through camera traps and crowd-sourcing, provides novel possibilities for animal monitoring and conservation. Image retrieval techniques, such as global pooling, can be used to solve the individual re-identification. However, current global pooling methods incorporate only value distribution of features, losing spatial information. To overcome the problem, we propose a novel pooling approach that allows aggregating the local pattern features to get a fixed size embedding vector that incorporates global features by taking into account their spatial distribution. This is obtained by eigen decomposition of covariances computed for probability mass functions representing feature maps. Embedding vectors can then be used to find the best match in the database of known individuals allowing animal re-identification. The results show that the proposed pooling technique outperforms the existing methods on the challenging Saimaa ringed seal image data. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Animal biometrics Global pooling pattern matching Saimaa ringed seals

来源：评论

学校读者我要写书评

暂无评论

NORPPA: NOvel Ringed Seal Re-Identification by Pelage pattern Aggregation

NORPPA: NOvel Ringed Seal Re-Identification by Pelage Patter...

引用

IEEE Winter Applications and computer vision Workshops (WACVW)

作者： Ekaterina Nepovinnykh Tuomas Eerola Heikki Kälviäinen Ilia Chelak Department of Computational Engineering School of Engineering Sciences Computer Vision and Pattern Recognition Laboratory (CVPRL) Lappeenranta-Lahti University of Technology LUT Lappeenranta Finland Department of Computer Science Faculty of Science University of Helsinki Helsinki Finland

We propose a method for Saimaa ringed seal (Pusa hispida saimensis) re-identification. Access to large image volumes through camera trapping and crowdsourcing provides novel possibilities for animal conservation and monitoring and calls for automatic methods for analysis, in particular, when re-identifying individual animals from the images. The proposed method NOvel Ringed seal re-identification by Pelage pattern Aggregation (NORPPA) utilizes the permanent and unique pelage pattern of Saimaa ringed seals and content-based image retrieval techniques. First, the query image is preprocessed, and each seal instance is segmented. Next, the seal's pelage pattern is extracted using a U-net encoder-decoder based method. Then, CNN-based affine invariant features are embedded and aggregated into Fisher Vectors. Finally, the cosine distance between the Fisher Vectors is used to find the best match from a database of known individuals. We perform extensive experiments of various modifications of the method on challenging Saimaa ringed seals re-identification dataset. The proposed method is shown to produce the best re-identification accuracy on our dataset in comparisons with alternative approaches.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Modeling Rumor Unidirectional Spreading from Online Social Networks to Offline 23

Modeling Rumor Unidirectional Spreading from Online Social N...

引用

Proceedings of the 2023 7th International Conference on Electronic Information Technology and computer Engineering

作者： Qiyi Han Yi Chen Key Laboratory of Pattern Recognition and Intelligent Information Processing School of Computer Science Chengdu University China College of Electronic Engineering Chengdu University of Information Technology China

ISBN: (纸本)9798400708305

Online social networks not only facilitate the dissemination of information, but also increase the risk of rumors. This paper focuses on studying the unidirectional spread of rumors from online social networks to offline environments. To describe the dynamic process of rumor spreading, we derive a unidirectional coupled network structure and mean-field equations. We illustrate the performance of rumor spreading under various scenarios using computer simulations. The simulations reveal that rumors in unidirectional coupled networks spread faster and wider than those in single layer networks. Furthermore, with the assistance of unidirectional links, rumors tend to persist for a longer duration and cause more severe damages.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Activating More Pixels in Image Super-Resolution Transformer

Activating More Pixels in Image Super-Resolution Transformer

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Xiangyu Chen Xintao Wang Jiantao Zhou Yu Qiao Chao Dong State Key Laboratory of Internet of Things for Smart City University of Macau Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory ARC Lab Tencent PCG

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Pose focus transformer meet inter-part relation

引用

Expert Systems with Applications 2024年 240卷

作者： Luo, Yanmin Lin, Hongwei Huang, Wenlin Wang, Youjie Du, Jixiang Guo, Jing-Ming College of Computer Science and Technology Huaqiao University Xiamen361021 China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao University Xiamen361021 China Maynooth International Engineering College Fuzhou University Fuzhou350108 China Department of Electrical Engineering National Taiwan University of Science and Technology Taipei10607 China

Human pose estimation in crowded scenes is a challenging task. Due to overlap and occlusion, it is difficult to infer pose clues from individual keypoints. We proposed PFFormer, a new transformer-based approach that treats pose estimation as a hierarchical set prediction problem that first focuses on human windows and coarsely predicts whole-body poses globally within them. In PFFormer, we designed a Windows Clustering Transformer (WCT), which reorganizes the image windows by filtering the attentive windows and fusing the inattentive ones, allowing the transformer to focus on the important regions while reducing the interference from the complex background, followed by compensating for the loss of information with a global transformer. Then we partition the learned body pose into a set of structural parts and perform the Inter-Part Relation Module (IPRM) to capture the correlation between multiple parts. These full-body poses and component features are refined at a finer level through the Part-to-Joint Decoder (PJD). Extensive experiments show that PFFormer performs favorably against its counterpart on challenging datasets, including COCO2017, CrowdPose, and OChuman datasets. The performance of crowded scenes, in particular, demonstrates the robustness of the proposed methods to deal with occlusion. © 2023 Elsevier Ltd

关键词： Information filtering

来源：评论

学校读者我要写书评

暂无评论

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL REPRESENTATION LEARNING 10

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL ...

引用

10th International Conference on Learning Representations, ICLR 2022

作者： Li, Kunchang Wang, Yali Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SenseTime Research The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10× fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is available at https://***/Sense-X/UniFormer. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

A Survey of Person Re-identification Based on Deep Learning 10

A Survey of Person Re-identification Based on Deep Learning

引用

10th International Conference on Computing and pattern recognition, ICCPR 2021

作者： Tian, Zimin Chen, Si Wang, Da-Han Lu, Junwen School of Computer and Information Engineering Xiamen University of Technology China Fujian Key Laboratory of Pattern Recognition and Image Understanding China

ISBN: (纸本)9781450390439

Person re-identification (Re-ID) has been a popular research topic in computer vision in recent years, and it has important application value in numerous fields, such as intelligent security. The person Re-ID task is to identify whether the pedestrians appearing under different cameras are the same person. The traditional person Re-ID methods mainly rely on the characteristics of manual design, and it has difficulty in solving the problems of person occlusion, posture change, and illumination variation. With the wide application of deep learning, the person Re-ID based on deep learning has brought new ideas for solving these problems, and has been widely concerned by scholars. This paper summarizes and analyzes the latest research trends of person Re-ID based on deep learning. In our work, the recent research works of person Re-ID are coarsely categorized into the supervised learning methods and the unsupervised learning methods according to whether the pedestrian images in the training set have real labels. We then describe the representative datasets used in the person Re-ID task. Finally, we conclude and discuss the future directions of the person Re-ID based on deep learning. © 2021 ACM.

关键词： Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

arXiv

引用

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Qin, Jin Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training. Copyright © 2023, The Authors. All rights reserved.

关键词： Textures

来源：评论

学校读者我要写书评

暂无评论

CodePhys: Robust Video-based Remote Physiological Measurement through Latent Codebook Querying

arXiv

引用

arXiv 2025年

作者： Chu, Shuyang Xia, Menghan Yuan, Mengyao Liu, Xin Seppanen, Tapio Zhao, Guoying Shi, Jingang The School of Software Engineering Xi’an Jiaotong University Xi’an China The Tencent AI Lab Shenzhen China The Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta53850 Finland The Center for Machine Vision and Signal Analysis University of Oulu Finland

Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of rPPG signal faces intractable challenges when interference from real-world scenarios takes place on facial video. Specifically, facial videos are inevitably affected by non-physiological factors (e.g., camera device noise, defocus, and motion blur), leading to the distortion of extracted rPPG signals. Recent rPPG extraction methods are easily affected by interference and degradation, resulting in noisy rPPG signals. In this paper, we propose a novel method named CodePhys, which innovatively treats rPPG measurement as a code query task in a noise-free proxy space (i.e., codebook) constructed by ground-truth PPG signals. We consider noisy rPPG features as queries and generate high-fidelity rPPG features by matching them with noise-free PPG features from the codebook. Our approach also incorporates a spatial-aware encoder network with a spatial attention mechanism to highlight physiologically active areas and uses a distillation loss to reduce the influence of non-periodic visual interference. Experimental results on four benchmark datasets demonstrate that CodePhys outperforms state-of-the-art methods in both intra-dataset and cross-dataset settings. © 2025, CC BY.

关键词： Heart

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：