Dataset distillation synthesizes a small dataset such that a model trained on this set approximates the performance of the original dataset. Recent studies on dataset distillation focused primarily on the design of th...
详细信息
ISBN:
(纸本)9789819787944;9789819787951
Dataset distillation synthesizes a small dataset such that a model trained on this set approximates the performance of the original dataset. Recent studies on dataset distillation focused primarily on the design of the optimization process, with methods such as gradient matching, feature alignment, and training trajectory matching. However, little attention has been given to the issue of underutilized regions in synthetic images. In this paper, we propose UDD, a novel approach to identify and exploit the underutilized regions to make them informative and discriminate, and thus improve the utilization of the synthetic dataset. Technically, UDD involves two underutilized regions searching policies for different conditions, i.e., response-based policy and data jittering-based policy. Compared with previous works, such two policies are utilization-sensitive, equipping with the ability to dynamically adjust the underutilized regions during the training process. Additionally, we analyze the current model optimization problem and design a category-wise feature contrastive loss, which can enhance the distinguishability of different categories and alleviate the shortcomings of the existing multi-formation methods. Experimentally, our method improves the utilization of the synthetic dataset and outperforms the state-of-the-art methods on various datasets, such as MNIST, FashionMNIST, SVHN, CIFAR-10, and CIFAR-100. For example, the improvements on CIFAR-10 and CIFAR-100 are 4.0 and 3.7% over the next best method with IPC = 1, by mining the underutilized regions.
The continuous expansion of neural network sizes is a notable trend in machine learning, with transformer models exceeding 20 billion parameters in computervision. This growth comes with rising demands for computatio...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
The continuous expansion of neural network sizes is a notable trend in machine learning, with transformer models exceeding 20 billion parameters in computervision. This growth comes with rising demands for computational resources and large-scale datasets. Efficient techniques for transfer learning thus become an attractive option in setups with limited data, as in handwriting recognition. Recently, parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and weight-decomposed low-rank adaptation (DoRA), have gained wide-spread interest. In this paper, we explore tradeoffs in parameter-efficient transfer learning using the synthetically pretrained Transformer-Based Optical Character recognition (TrOCR) model for handwritten text recognition with LoRA and DoRA. Additionally, we analyze the performance of full fine-tuning with a limited number of samples, scaling from a few-shot learning scenario up to using the whole dataset. We conduct experiments on the popular IAM Handwriting database as well as the historical READ 2016 dataset. We find that (a) LoRA/DoRA does not outperform full fine-tuning as opposed to a recent paper and (b) LoRA/DoRA is not substantially faster than full fine-tuning of TrOCR.
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate tw...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate two fusion strategies: early-fusion, which combines RGB frames with Gaussian heatmaps of pose keypoints at the input stage, and latefusion, which employs a multi-stream architecture with attention mechanisms to combine RGB and pose features. Experiments on the FR-FS dataset demonstrate that Gate-Shift-Pose significantly outperforms the RGB-only baseline, improving accuracy by up to 40% with ResNet18 and 20% with ResNet50. Early-fusion achieves the highest accuracy (98.08%) with ResNet50, leveraging the model's capacity for effective multimodal integration, while latefusion is better suited for lighter backbones like ResNet18. These results highlight the potential of multimodal architectures for sports action recognition and the critical role of skeleton pose information in capturing complex motion patterns.
The proceedings contain 201 papers. The topics discussed include: a camera-projector system for robot positioning by visual servoing;robust content-dependent photometric projector compensation;automatic interactive ca...
详细信息
ISBN:
(纸本)0769526462
The proceedings contain 201 papers. The topics discussed include: a camera-projector system for robot positioning by visual servoing;robust content-dependent photometric projector compensation;automatic interactive calibration of multi-projector-camera systems;robust and accurate visual echo cancellation in a full-duplex projector-camera system;discriminative patch selection using combinatorial and statistical models for patch-based object recognition;integrating co-occurrence and spatial contexts on patch-based scene segmentation;using spatio-temporal patches for simultaneous estimation of edge strength, orientation, and motion;integrating spatial and discriminant strength for feature selection and linear dimensionality reduction;fingerprint authentication device based on optical characteristics inside a finger;and empirical mode decomposition liveness check in fingerprint time series captures.
The proceedings contain 167 papers. The topics discussed include: incremental learning of object detectors using a visual shape alphabet;multiclass object recognition with sparse, localized features;unsupervised learn...
详细信息
ISBN:
(纸本)0769525970
The proceedings contain 167 papers. The topics discussed include: incremental learning of object detectors using a visual shape alphabet;multiclass object recognition with sparse, localized features;unsupervised learning of categories from sets of partially matching image features;the layout consistent random random field for recognizing and segmenting partially occluded objects;ultrasound-specific segmentation via correlation and statistical region-based active contours;principled hybrids of generative and discriminative models;a comic section classifier and its application to image datasets;learning non-metric partial similarity based on maximal margin criterion;distributed cost boosting on mis-classification cost;equivalence of non-iterative algorithms for simultaneous low rank approximations of matrices;and semi-supervised classification using liner neighborhood propagation.
The proceedings contain 151 papers. The topics discussed include: clustering appearance for scene analysis;fast compact city modeling for navigation pre-visualization;fusion of summation invariants in 3D human face re...
详细信息
ISBN:
(纸本)0769525970
The proceedings contain 151 papers. The topics discussed include: clustering appearance for scene analysis;fast compact city modeling for navigation pre-visualization;fusion of summation invariants in 3D human face recognition;deformation modeling for robust 3D face matching;locally linear models on face appearance manifolds with application to dual-subspace based classification;learning examplar-based categorization for the detection of multi-view multi-pose objects;aligning ASL for statistical translation using discriminative word model;a graph based approach for naming faces in news photos;fast human detection using a cascade of histograms of oriented gradients;real-time-hand pose recognition using low resolution depth images;automatic cast listing in feature-length films with anisotropic manifold space;and body localization in still images using hierarchical models and hybrid search.
暂无评论