This study aims to develop a system for extracting crucial information from tire sidewalls using Optical Character recognition (OCR). Initially, images of tire were captured manually by smartphone cameras, including R...
详细信息
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominan...
详细信息
ISBN:
(纸本)9781665445092
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO [31] and Cityscapes [8] with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training.
Multi-camera tracking (MCT) plays a crucial role in various computer vision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In thi...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Multi-camera tracking (MCT) plays a crucial role in various computer vision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In this paper, we present an efficient online MCT system that tackles these challenges through online processing. Our system leverages memory-efficient accumulated appearance features to provide stable representations of individuals across cameras and time. By incorporating trajectory validation using hierarchical agglomerative clustering (HAC) in overlapping regions, ID transfers are identified and rectified. Evaluation on the 2024 AI City Challenge Track 1 dataset [39] demonstrates the competitive performance of our system, achieving accurate tracking in both overlapping and non-overlapping camera networks. With a 40.3% HOTA score [29], our system ranked 9th in the challenge. The integration of trajectory validation enhances performance by 8% over the baseline, and the accumulated appearance features further contribute to a 17% improvement.
The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep-learning models. Its widespread adoption in fields like natural...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep-learning models. Its widespread adoption in fields like natural language processing.computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.
作者:
Guo, KuoLi, YifanChen, HaoShen, Hong-BinYang, YangShanghai Jiao Tong University
Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Department of Computer Science and Engineering Shanghai200240 China Shanghai Jiao Tong University
Key Laboratory of System Control and Information Processing Ministry of Education of China Institute of Image Processing and Pattern Recognition Shanghai200240 China Carnegie Mellon University
School of Computer Science Computational Biology Department PittsburghPA15213 United States
Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bio...
详细信息
Hyperspectral imaging offers the capacity to quickly and noninvasively monitor a food product's physical, chemical and morphological properties. Specim IQ is a handheld push broom camera with basic data handling a...
详细信息
This paper introduces our solution for the Track2 in AI City Challenge 2021 (AICITY21). The Track2 is a vehicle re-identification (ReID) task with both the real-world data and synthetic data. We mainly focus on four p...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces our solution for the Track2 in AI City Challenge 2021 (AICITY21). The Track2 is a vehicle re-identification (ReID) task with both the real-world data and synthetic data. We mainly focus on four points, i.e. training data, unsupervised domain-adaptive (UDA) training, post-processing. model ensembling in this challenge. (1) Both cropping training data and using synthetic data can help the model learn more discriminative features. (2) Since there is a new scenario in the test set that dose not appear in the training set, UDA methods perform well in the challenge. (3) Post-processing.techniques including re-ranking, image-to-track retrieval, inter-camera fusion, etc, significantly improve final performance. (4) We ensemble CNN-based models and transformer-based models which provide different representation diversity. With aforementioned techniques, our method finally achieves 0.7445 mAP score, yielding the first place in the competition. Codes are available at https://***/michuanhaohao/AICITY2021_Track2_DMT.
The Alberta Infant Motor Scale (AIMS) is a well-known assessment scheme that evaluates the gross motor development of infants by recording the number of specific poses achieved. With the aid of the image-based pose re...
详细信息
In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained in...
详细信息
In recent years, deep dictionary learning (DDL)has attracted a great amount of attention due to its effectiveness for represen-tation learning and visual recognition. However, most existing methods focus on unsupervis...
详细信息
暂无评论