In this paper, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the min...
详细信息
We present a task from the critical infrastructure field in materials engineering. We created a surrogate model for the bridge construction object to determine the material parameters’ values. The work aims to use ne...
详细信息
ISBN:
(数字)9798331504489
ISBN:
(纸本)9798331504496
We present a task from the critical infrastructure field in materials engineering. We created a surrogate model for the bridge construction object to determine the material parameters’ values. The work aims to use neural networks to conduct an initial investigation of the task and to find out the aspects of machine learning application. To reduce the computational complexity of the models, we designed specific neural networks whose architecture corresponds to the structure and characteristics of the processed data. Furthermore, we outcome also interpretability and justification of the model’s decision-making. The main contribution of the work is the replacement of the unknown or too complex physical, mathematical description of material objects with a neural network model.
作者:
Guo, KuoLi, YifanChen, HaoShen, Hong-BinYang, YangShanghai Jiao Tong University
Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Department of Computer Science and Engineering Shanghai200240 China Shanghai Jiao Tong University
Key Laboratory of System Control and Information Processing Ministry of Education of China Institute of Image Processing and Pattern Recognition Shanghai200240 China Carnegie Mellon University
School of Computer Science Computational Biology Department PittsburghPA15213 United States
Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bio...
详细信息
Recently, in order to pursue better detection results, more convolutional layers and deeper networks are a direction pursued by everyone. However, more and more down-sampling convolution or up-sampling operations gene...
详细信息
Human pose estimation in crowded scenes is a challenging task. Due to overlap and occlusion, it is difficult to infer pose clues from individual keypoints. We proposed PFFormer, a new transformer-based approach that t...
详细信息
Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are ...
详细信息
Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are taken towards the automatic photo-identification of the Ladoga ringed seals (Pusa hispida ladogensis). A method is proposed that takes a sequence of images, each containing multiple individuals as the input, and produces cropped images of seals grouped based on one certain individual per group. The method starts by detecting each seal from the images and proceeds to matching the individual seals between the images. It is shown that high grouping accuracy can be obtained with a general-purpose image retrieval method on an image sequence taken from the same location within a relatively short period of time. Each resulting group contains multiple images of one individual with slightly different variations, for example, in pose and illumination. Utilizing these images simultaneously provides more information for the individual re-identification compared to the traditional approach, i.e., which utilizes just one image at a time. It is further demonstrated that a convolutional neural network based method can be used to extract the unique pelage patterns of the seals despite the low contrast. Finally, a method is proposed and experiments with the novel Ladoga ringed seals data are carried out to provide a proof-of-concept for the individual re-identification.
Image-based re-identification of animal individuals allows gathering of information such as migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdso...
详细信息
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Su...
详细信息
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.
Large Language Models (LLMs), such as GPT-4, have demonstrated impressive mathematical reasoning capabilities, achieving near-perfect performance on benchmarks like GSM8K. However, their application in personalized ed...
详细信息
暂无评论