The classification loss functions used in deep neural network classifiers can be grouped into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vec...
详细信息
Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated dat...
详细信息
The rapid development of intelligent transportation and the exponential growth of data traffic drive the emerging of more computation-intensive latency-critical tasks in vehicles, bring challenges to the task offloadi...
详细信息
Test-Time-Training (TTT) is an approach to cope with out-of-distribution (OOD) data by adapting a trained model to distribution shifts occurring at test-time. We propose to perform this adaptation via Activation Match...
Test-Time-Training (TTT) is an approach to cope with out-of-distribution (OOD) data by adapting a trained model to distribution shifts occurring at test-time. We propose to perform this adaptation via Activation Matching (ActMAD): We analyze activations of the model and align activation statistics of the OOD test data to those of the training data. In contrast to existing methods, which model the distribution of entire channels in the ultimate layer of the feature extractor, we model the distribution of each feature in multiple layers across the network. This results in a more fine-grained supervision and makes ActMAD attain state of the art performance on CIFAR-100C and Imagenet-C. ActMAD is also architecture-and task-agnostic, which lets us go beyond image classification, and score 15.4% improvement over previous approaches when evaluating a KITTI-trained object detector on KITTI-Fog. Our experiments highlight that ActMAD can be applied to online adaptation in realistic scenarios, requiring little data to attain its full performance.
In the field of autonomous driving, self-training is widely applied to mitigate distribution shifts in LiDAR-based 3D object detectors. This eliminates the need for expensive, high-quality labels whenever the environm...
详细信息
Recently, large-scale pre-trained vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categori...
Recently, large-scale pre-trained vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zero-shot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.
Event knowledge graph (EKG) as a special case of knowledge graph (KG) can realize the goal of event prediction, and has been proved useful in medical diagnosis and intelligent recommendation. To successfully build an ...
Event knowledge graph (EKG) as a special case of knowledge graph (KG) can realize the goal of event prediction, and has been proved useful in medical diagnosis and intelligent recommendation. To successfully build an EKG, knowledge representation learning is often required to compute the semantic links of entities and relationships in a low-dimensional space and solve the data sparsity issue in knowledge acquisition, fusion and reasoning. This paper proposes a new EKG representation learning model featuring the integration of event entity attributes and relation paths. By utilizing the knowledge of entity attribute, which contains entity type and entity description, and the knowledge about relation paths, the entity initial vector is obtained by multiplying entity semantic vector, entity description representation vector and entity type representation vector, and the representation of relation path is obtained according to the relation between event pairs, a translation-based model framework is used to integrate and train all vectors to obtain the entity learning vector and the relation learning vector. our method can generate more expressive learning representations, and consequently, enhance the inference performance of EKG. Experiments on publicly available real-world EKG datasets show that our method achieves better performance than the state-of-the-art models on two typical tasks.
In this paper, a novel framework for 2D-to-3D human pose estimation from video is proposed by exploiting multi-scale multi-level spatial temporal features. To extract and exploit the rich features, the framework consi...
This Following on from work by Babalola et al. It is shown that the sex of mice can be determined from x-ray images of the chest region alone using convolutional neural networks. The anatomical differences that may be...
详细信息
暂无评论