检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Cai, Shen Wu, Zhanhao Guo, Lingxi Wang, Jiachun Zhang, Siyu Yan, Junchi Shen, Shuhan Visual and Geometric Perception Lab Donghua University China Department of Computer Science and Engineering Shanghai Jiao Tong University China National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences China

In this paper, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal 4-point configuration, the first and the last similarity transformations in SKS are computed by two anchor points on target and source planes, respectively. Then, the other two point correspondences can be exploited to compute the middle kernel transformation with only four parameters. Furthermore, ACA uses three anchor points to compute the first and the last affine transformations, followed by computation of the middle core transformation utilizing the other one point correspondence. ACA can compute a homography up to a scale with only 85 floating-point operations (FLOPs), without even any division operations. Therefore, as a plug-in module, ACA facilitates the traditional feature-based Random Sample Consensus (RANSAC) pipeline, as well as deep homography pipelines estimating 4-point offsets. In addition to the advantages of geometric parameterization and computational efficiency, SKS and ACA can express each element of homography by a polynomial of input coordinates (7th degree to 9th degree), extend the existing essential Similarity-Affine-Projective (SAP) decomposition and calculate 2D affine transformations in a unified way. Source codes are released in https://***/cscvlab/SKS-Homography. Copyright © 2024, The Authors. All rights reserved.

关键词： Linear systems

来源：评论

学校读者我要写书评

暂无评论

Design of surrogate models in civil engineering by neural networks

Design of surrogate models in civil engineering by neural ne...

引用

South Eastern European Design Automation, computer engineering, computer Networks and Social Media Conference (SEEDA-CECNSM)

作者： Vojtěch Drahý Radek Mařík Heikki Kälviäinen Department of Computer Science Czech Technical University in Prague Prague Czech Republic Department of Telecommunication Engineering Czech Technical University in Prague Prague Czech Republic Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta Finland

ISBN: (数字)9798331504489

ISBN: (纸本)9798331504496

We present a task from the critical infrastructure field in materials engineering. We created a surrogate model for the bridge construction object to determine the material parameters’ values. The work aims to use neural networks to conduct an initial investigation of the task and to find out the aspects of machine learning application. To reduce the computational complexity of the models, we designed specific neural networks whose architecture corresponds to the structure and characteristics of the processed data. Furthermore, we outcome also interpretability and justification of the model’s decision-making. The main contribution of the work is the replacement of the unknown or too complex physical, mathematical description of material objects with a neural network model.

关键词： Training Social networking (online) Computational modeling Neural networks Focusing computer architecture Machine learning Feature extraction Mathematical models Computational complexity

来源：评论

学校读者我要写书评

暂无评论

Isoform Function Prediction Based on Heterogeneous Graph Attention Networks

Isoform Function Prediction Based on Heterogeneous Graph Att...

引用

2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023

作者： Guo, Kuo Li, Yifan Chen, Hao Shen, Hong-Bin Yang, Yang Shanghai Jiao Tong University Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Department of Computer Science and Engineering Shanghai200240 China Shanghai Jiao Tong University Key Laboratory of System Control and Information Processing Ministry of Education of China Institute of Image Processing and Pattern Recognition Shanghai200240 China Carnegie Mellon University School of Computer Science Computational Biology Department PittsburghPA15213 United States

ISBN: (纸本)9798350337488

Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bioinformatics as it can provide valuable insights into the intricate mechanisms of gene regulation and biological processes. Conventionally, gene function labels are standardized in Gene Ontology (GO) terms. However, traditional methods for predicting isoform function are largely limited by the absence of isoform-specific labels, sparse annotations, and the vast number of GO terms. To address these issues, we propose HANIso, a deep learning-based method for isoform function prediction. HANIso leverages a pretrained protein language model to extract features from protein sequences. It also integrates heterogeneous information, such as isoform sequence features, GO annotations, and isoform interaction data, using a Heterogeneous Graph Attention Network (HAN). This allows the model to learn the importance of different sources of information and their semantic relationships through the attention mechanism. Our method can predict function labels at both the gene level and isoform level. We conduct experiments on two species datasets, and the results demonstrate that our method outperforms existing methods on both AUROC and AUPRC. HANIso has the potential to overcome the limitations of traditional methods and provide a more accurate and comprehensive understanding of isoform function. © 2023 IEEE.

关键词： alternative splicing gene ontology heterogeneous graph attention network isoform function prediction protein language model

来源：评论

学校读者我要写书评

暂无评论

IEEE T-BIOM Editorial Board Changes

IEEE Transactions on Biometrics, Behavior, and Identity Scie...

引用

IEEE Transactions on Biometrics, Behavior, and Identity Science 2024年第1期7卷 1-2页

作者： Naser Damer Weihong Deng Jianjiang Feng Vishal M. Patel Ajita Rattani Mark Nixon Smart Living & Biometric Technologies Fraunhofer Institute for Computer Graphics Research IGD Darmstadt Germany Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing China Department of Automation Tsinghua University Beijing China Department of Electrical and Computer Engineering Johns Hopkins University Baltimore MD USA Department of Computer Science and Engineering College of Engineering University of North Texas Denton TX USA

来源：评论

学校读者我要写书评

暂无评论

NLFA: A Non Local Fusion Alignment Module for Multi-Scale Feature in Object Detection 3rd

NLFA: A Non Local Fusion Alignment Module for Multi-Scale Fe...

引用

3rd International Symposium on Automation, Mechanical and Design engineering, SAMDE 2022

作者： Xue, Honghui Ma, Jinshan Cai, Zheyi Fu, Junfang Guo, Feng Weng, Wei Dong, Yunxin Zhang, Zhenchang College of Computer and Information Sciences Fujian Agriculture and Forestry University Fuzhou China Fujian Zhongke Zhongxin Intelligent Technology Co. Ltd Fuzhou China Fujian Newland Auto-ID Tech. Co. Ltd Fuzhou China Department of Computer and Information Engineering Xiamen University of Technology Xiamen China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen China

ISBN: (纸本)9783031400698

Recently, in order to pursue better detection results, more convolutional layers and deeper networks are a direction pursued by everyone. However, more and more down-sampling convolution or up-sampling operations generate feature maps of different scales, which makes it difficult to avoid the loss of detailed information of the image, and the distribution of different scales features will be misaligned. In particular, the loss and dislocation of the target boundary information will affect the features learned by the model and reduce the accuracy. This paper proposes a feature alignment method based on non-local idea, and designed two modules—Non Local Align Module (NLA) and Channel Fusion Augment Module (CFA). At the same time, the neighborhood calculation algorithm is also designed for it, which strengthens the binding force on the calculation of boundary information. These two modules can be easily embedded into the current mainstream object detection network to improve the detection effect of the model. Compared to the previous model, the network using our NLA module and CFA module achieves better results than the original model. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Pose focus transformer meet inter-part relation

引用

Expert Systems with Applications 2024年 240卷

作者： Luo, Yanmin Lin, Hongwei Huang, Wenlin Wang, Youjie Du, Jixiang Guo, Jing-Ming College of Computer Science and Technology Huaqiao University Xiamen361021 China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao University Xiamen361021 China Maynooth International Engineering College Fuzhou University Fuzhou350108 China Department of Electrical Engineering National Taiwan University of Science and Technology Taipei10607 China

Human pose estimation in crowded scenes is a challenging task. Due to overlap and occlusion, it is difficult to infer pose clues from individual keypoints. We proposed PFFormer, a new transformer-based approach that treats pose estimation as a hierarchical set prediction problem that first focuses on human windows and coarsely predicts whole-body poses globally within them. In PFFormer, we designed a Windows Clustering Transformer (WCT), which reorganizes the image windows by filtering the attentive windows and fusing the inattentive ones, allowing the transformer to focus on the important regions while reducing the interference from the complex background, followed by compensating for the loss of information with a global transformer. Then we partition the learned body pose into a set of structural parts and perform the Inter-Part Relation Module (IPRM) to capture the correlation between multiple parts. These full-body poses and component features are refined at a finer level through the Part-to-Joint Decoder (PJD). Extensive experiments show that PFFormer performs favorably against its counterpart on challenging datasets, including COCO2017, CrowdPose, and OChuman datasets. The performance of crowded scenes, in particular, demonstrates the robustness of the proposed methods to deal with occlusion. © 2023 Elsevier Ltd

关键词： Information filtering

来源：评论

学校读者我要写书评

暂无评论

Matching individual Ladoga ringed seals across short-term image sequences (vol 102, pg 957, 2022)

引用

MAMMALIAN BIOLOGY 2022年第3期102卷 1045-1045页

作者： Nepovinnykh, E. Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering School of Engineering Science Lappeenranta-Lahti University of Technology LUT P.O.Box 20 53851 Lappeenranta Finland Department of Artificial Intelligence Institute of Computer Science and Technology Peter the Great St. Petersburg Polytechnic University Polytechnicheskaya 29 Saint Petersburg Russian Federation 195251 Department of Computer Science and Computational Experiment Southern Federal University Rostov-on-Don Russian Federation 344006 Interregional Charitable Public Organization “Biologists for Nature Conservation” (BFNC) 24 line 3-7 Saint Petersburg Russian Federation 199106

Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are taken towards the automatic photo-identification of the Ladoga ringed seals (Pusa hispida ladogensis). A method is proposed that takes a sequence of images, each containing multiple individuals as the input, and produces cropped images of seals grouped based on one certain individual per group. The method starts by detecting each seal from the images and proceeds to matching the individual seals between the images. It is shown that high grouping accuracy can be obtained with a general-purpose image retrieval method on an image sequence taken from the same location within a relatively short period of time. Each resulting group contains multiple images of one individual with slightly different variations, for example, in pose and illumination. Utilizing these images simultaneously provides more information for the individual re-identification compared to the traditional approach, i.e., which utilizes just one image at a time. It is further demonstrated that a convolutional neural network based method can be used to extract the unique pelage patterns of the seals despite the low contrast. Finally, a method is proposed and experiments with the novel Ladoga ringed seals data are carried out to provide a proof-of-concept for the individual re-identification.

关键词： Animal re-identification Convolutional neural networks Instance segmentation Ladoga ringed seal Photo-identification

来源：评论

学校读者我要写书评

暂无评论

Combining feature aggregation and geometric similarity for re-identification of patterned animals

arXiv

引用

arXiv 2023年

作者： Immonen, Veikka Nepovinnykh, Ekaterina Eerola, Tuomas Stewart, Charles V. Kälviäinen, Heikki Computer Vision and Pattern Recognition Laboratory Department of Computational Engineering School of Engineering Sciences Lappeenranta-Lahti University of Technology LUT LappeenrantaFI-53851 Finland Department of Computer Science Rensselaer Polytechnic Institute TroyNY12180 United States

Image-based re-identification of animal individuals allows gathering of information such as migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analyzing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, we address the re-identification by combining two types of pattern similarity metrics: 1) pattern appearance similarity obtained by pattern feature aggregation and 2) geometric pattern similarity obtained by analyzing the geometric consistency of pattern similarities. The proposed combination allows to efficiently utilize both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, we demonstrate that the method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks. © 2023, CC BY-NC-SA.

关键词： Animals

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Audio Captioning Using Soft and Hard Prompts

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2025年 33卷 2045-2058页

作者： Yiming Zhang Xuenan Xu Ruoyi Du Haohe Liu Yuan Dong Zheng-Hua Tan Wenwu Wang Zhanyu Ma Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China Centre for Vision Speech and Signal Processing University of Surrey Guildford U.K. Department of Electronic Systems Aalborg University Aalborg Denmark

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.

关键词： Training Decoding Semantics Data models Acoustics Electronic mail Benchmark testing Transformers Robustness Perturbation methods

来源：评论

学校读者我要写书评

暂无评论

From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education

arXiv

引用

arXiv 2025年

作者： Zhang, Yi-Fan Li, Hang Song, Dingjie Sun, Lichao Xu, Tianlong Wen, Qingsong National Laboratory of Pattern Recognition University of Chinese Academy of Sciences China Computer Science department from Michigan State University United States CUHK-Shenzhen Natural Language Processing group China Computer Science and Engineering Lehigh University United States Squirrel Ai Learning group

Large Language Models (LLMs), such as GPT-4, have demonstrated impressive mathematical reasoning capabilities, achieving near-perfect performance on benchmarks like GSM8K. However, their application in personalized education remains limited due to an overemphasis on correctness over error diagnosis and feedback generation. Current models fail to provide meaningful insights into the causes of student mistakes, limiting their utility in educational contexts. To address these challenges, we present three key contributions. First, we introduce MathCCS (Mathematical Classification and Constructive Suggestions), a multi-modal benchmark designed for systematic error analysis and tailored feedback. MathCCS includes real-world problems, expert-annotated error categories, and longitudinal student data. Evaluations of state-of-the-art models, including Qwen2-VL, LLaVA-OV, Claude-3.5-Sonnet and GPT-4o, reveal that none achieved classification accuracy above 30% or generated high-quality suggestions (average scores below 4/10), highlighting a significant gap from human-level performance. Second, we develop a sequential error analysis framework that leverages historical data to track trends and improve diagnostic precision. Finally, we propose a multi-agent collaborative framework that combines a Time Series Agent for historical analysis and an MLLM Agent for real-time refinement, enhancing error classification and feedback generation. Together, these contributions provide a robust platform for advancing personalized education, bridging the gap between current AI capabilities and the demands of real-world teaching. Copyright © 2025, The Authors. All rights reserved.

关键词： Systematic errors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：