The proceedings contain 602 papers. The topics discussed include: going deeper with convolutions;propagated image filtering;web scale photo hash clustering on a single machine;supervised discrete hashing;what do 15,00...
ISBN:
(纸本)9781467369640
The proceedings contain 602 papers. The topics discussed include: going deeper with convolutions;propagated image filtering;web scale photo hash clustering on a single machine;supervised discrete hashing;what do 15,000 object categories tell us about classifying and localizing actions?;landmarks-based kernelized subspace alignment for unsupervised domain adaptation;blur kernel estimation using normalized color-line priors;a light transport model for mitigating multipath interference in time-of-flight sensors;traditional saliency reloaded: a good old model in new shape;automatic construction of robust spherical harmonic subspaces;leveraging stereo matching with learning-based confidence measures;saliency detection via cellular automata;and efficient sparse-to-dense optical flow estimation using a learned basis and layers.
The continuous expansion of neural network sizes is a notable trend in machine learning, with transformer models exceeding 20 billion parameters in computervision. This growth comes with rising demands for computatio...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
The continuous expansion of neural network sizes is a notable trend in machine learning, with transformer models exceeding 20 billion parameters in computervision. This growth comes with rising demands for computational resources and large-scale datasets. Efficient techniques for transfer learning thus become an attractive option in setups with limited data, as in handwriting recognition. Recently, parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and weight-decomposed low-rank adaptation (DoRA), have gained wide-spread interest. In this paper, we explore tradeoffs in parameter-efficient transfer learning using the synthetically pretrained Transformer-Based Optical Character recognition (TrOCR) model for handwritten text recognition with LoRA and DoRA. Additionally, we analyze the performance of full fine-tuning with a limited number of samples, scaling from a few-shot learning scenario up to using the whole dataset. We conduct experiments on the popular IAM Handwriting database as well as the historical READ 2016 dataset. We find that (a) LoRA/DoRA does not outperform full fine-tuning as opposed to a recent paper and (b) LoRA/DoRA is not substantially faster than full fine-tuning of TrOCR.
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate tw...
详细信息
ISBN:
(数字)9798331536626
ISBN:
(纸本)9798331536633
This paper introduces Gate-Shift-Pose, an enhanced version of Gate-Shift-Fuse networks, designed for athlete fall classification in figure skating by integrating skeleton pose data alongside RGB frames. We evaluate two fusion strategies: early-fusion, which combines RGB frames with Gaussian heatmaps of pose keypoints at the input stage, and latefusion, which employs a multi-stream architecture with attention mechanisms to combine RGB and pose features. Experiments on the FR-FS dataset demonstrate that Gate-Shift-Pose significantly outperforms the RGB-only baseline, improving accuracy by up to 40% with ResNet18 and 20% with ResNet50. Early-fusion achieves the highest accuracy (98.08%) with ResNet50, leveraging the model's capacity for effective multimodal integration, while latefusion is better suited for lighter backbones like ResNet18. These results highlight the potential of multimodal architectures for sports action recognition and the critical role of skeleton pose information in capturing complex motion patterns.
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized ma...
详细信息
ISBN:
(纸本)9781467367592
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized mapping from the lighting to the image. Such specular objects have very different optical properties from both diffuse surfaces and smooth specular objects like metals, so we design a special imaging system to robustly and effectively photograph them. We present simple yet reliable algorithms to calibrate the proposed system and do the inference. We conduct experiments to verify the correctness of our model assumptions and prove the effectiveness of our pipeline.
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical...
详细信息
ISBN:
(纸本)9781467367592
We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical scenes with different poses and camera positions by applying geometric deformations to the first frame. We use these synthetic images to generate a custom labeled training set for the video in question. This training data is then used to learn a regressor (for future frames) that predicts joint locations from image data. Notably, our training set is so accurate that nearest-neighbor (NN) matching on low-resolution pixel features works well. As such, we name our underlying representation "tiny synthetic videos". We present quantitative results the Friends benchmark dataset that suggests our simple approach matches or exceed state-of-the-art.
Given a machine learning model, adversarial perturbations transform images such that the model's output is classified as an attacker chosen class. Most research in this area has focused on adversarial perturbation...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Given a machine learning model, adversarial perturbations transform images such that the model's output is classified as an attacker chosen class. Most research in this area has focused on adversarial perturbations that are imperceptible to the human eye. However, recent work has considered attacks that are perceptible but localized to a small region of the image. Under this threat model, we discuss both defenses that remove such adversarial perturbations, and attacks that can bypass these defenses.
This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and...
详细信息
ISBN:
(纸本)9781665448994
This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and has an image size of 128 x 128. This is a niche dataset that aims to improve visibility, inclusion, and familiarity of African fashion in computervision ***1600 dataset is available here.
The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translatio...
详细信息
ISBN:
(纸本)9781665487399
The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translations and image rotations. It is experimentally shown that this boost is obtained without reducing performance on ordinary illumination and viewpoint matching sequences.
暂无评论