Tracking objects in soccer videos is extremely important to gather both player and team statistics, whether it is to estimate the total distance run, the ball possession or the team formation. Video processing can hel...
详细信息
ISBN:
(纸本)9781665487399
Tracking objects in soccer videos is extremely important to gather both player and team statistics, whether it is to estimate the total distance run, the ball possession or the team formation. Video processing can help automating the extraction of those information, without the need of any invasive sensor, hence applicable to any team on any stadium. Yet, the availability of datasets to train learnable models and benchmarks to evaluate methods on a common testbed is very limited. In this work, we propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each, representative of challenging soccer scenarios, and a complete 45-minutes half-time for long-term tracking. the dataset is fully annotated with bounding boxes and tracklet IDs, enabling the training of MOT baselines in the soccer domain and a full benchmarking of those methods on our segregated challenge sets. Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved, with several improvement required in case of fast motion or in scenarios of severe occlusion.
Multi-camera person tracking has gained significant attention in recent times, owing to its widespread application in surveillance scenarios. However, this task is challenging due to the variance viewpoints, heavy occ...
详细信息
this paper presents results from the second thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the s...
详细信息
ISBN:
(纸本)9781665448994
this paper presents results from the second thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used;only mid-resolution (MR) and high-resolution (HR) sets have been considered. the dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. the two evaluation methodologies proposed for the first challenge are also considered in this opportunity. the first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). the second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the x2 SR obtained from the given MR thermal image;this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. the results out-performed those from the first challenge, thus showing an improvement in both evaluation metrics.
this paper summarizes the top contributions to the first challenge on thermal image super-resolution (TISR), which was organized as part of the Perception Beyond the Visible Spectrum (PBVS) 2020 workshop. In this chal...
详细信息
ISBN:
(纸本)9781728193601
this paper summarizes the top contributions to the first challenge on thermal image super-resolution (TISR), which was organized as part of the Perception Beyond the Visible Spectrum (PBVS) 2020 workshop. In this challenge, a novel thermal image dataset is considered together with state-of-the-art approaches evaluated under a common framework. the dataset used in the challenge consists of 1021 thermal images, obtained from three distinct thermal cameras at different resolutions (low-resolution, mid-resolution, and high-resolution), resulting in a total of 3063 thermal images. From each resolution, 951 images are used for training and 50 for testing while the 20 remaining images are used for two proposed evaluations. the first evaluation consists of downsampling the low-resolution, mid-resolution, and high-resolution thermal images by x2, x3 and x4 respectively, and comparing their super-resolution results withthe corresponding ground truth images. the second evaluation is comprised of obtaining the x2 super-resolution from a given mid-resolution thermal image and comparing it withthe corresponding semi-registered high-resolution thermal image. Out of 51 registered participants, 6 teams reached the final validation phase.
the proceedings contain 280 papers. the topics discussed include: the role of synchronic causal conditions in visual knowledge learning;attention-based natural language person retrieval;Singlets: multi-resolution moti...
ISBN:
(纸本)9781538607336
the proceedings contain 280 papers. the topics discussed include: the role of synchronic causal conditions in visual knowledge learning;attention-based natural language person retrieval;Singlets: multi-resolution motion singularities for soccer video abstraction;hockey action recognition via integrated stacked hourglass network;extraction and classification of diving clips from continuous video footage;accurate and efficient 3D human pose estimation algorithm using single depth images for pose analysis in golf;athlete pose estimation by a global-local network;continuous video to simple signals for swimming stroke detection with convolutional neural networks;application of computervision and vector space model for tactical movement classification in badminton;automatic tactical adjustment in real-time: modeling adversary formations with radon-cumulative distribution transform and canonical correlation analysis;infrared variation optimized deep convolutional neural network for robust automatic ground target recognition;an algorithm for parallel reconstruction of jointly sparse tensors with applications to hyperspectral imaging;deep heterogeneous face recognition networks based on cross-modal distillation and an equitable distance metric;face presentation attack with latex masks in multispectral videos;privacy-preserving understanding of human body orientation for smart meetings;and fast, accurate thin-structure obstacle detection for autonomous mobile robots.
the proceedings contain 781 papers. the topics discussed include: exclusivity-consistency regularized multi-view subspace clustering;borrowing treasures from the wealthy: deep transfer learning through selective joint...
the proceedings contain 781 papers. the topics discussed include: exclusivity-consistency regularized multi-view subspace clustering;borrowing treasures from the wealthy: deep transfer learning through selective joint fine-tuning;the more you know: using knowledge graphs for image classification;dynamic edge-conditioned filters in convolutional neural networks on graphs;convolutional neural network architecture for geometric matching;deep affordance-grounded sensorimotor object recognition;on compressing deep models by low rank and sparse decomposition;unsupervised pixel-level domain adaptation with generative adversarial networks;photo-realistic single image super-resolution using a generative adversarial network;a practical method for fully automatic intrinsic camera calibration using directionally encoded light;elastic shape-from-template with spatially sparse deforming forces;and distinguishing the indistinguishable: exploring structural ambiguities via geodesic context.
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
Event-based vision, as realized by bio-inspired Dynamic vision Sensors (DVS), is gaining more and more popularity due to its advantages of high temporal resolution, wide dynamic range and power efficiency at the same ...
详细信息
ISBN:
(纸本)9781538607336
Event-based vision, as realized by bio-inspired Dynamic vision Sensors (DVS), is gaining more and more popularity due to its advantages of high temporal resolution, wide dynamic range and power efficiency at the same time. Potential applications include surveillance, robotics, and autonomous navigation under uncontrolled environment conditions. In this paper, we deal with event-based vision for 3D reconstruction of dynamic scene content by using two stationary DVS in a stereo configuration. We focus on a cooperative stereo approach and suggest an improvement over a previously published algorithm that reduces the measured mean error by over 50 percent. An available ground truth data set for stereo event data is utilized to analyze the algorithm's sensitivity to parameter variation and for comparison with competing techniques.
In this paper, we have proposed a method to detect abnormal events for human group activities. Our main contribution is to develop a strategy that learns with very few videos by isolating the action and by using super...
详细信息
ISBN:
(纸本)9781538607336
In this paper, we have proposed a method to detect abnormal events for human group activities. Our main contribution is to develop a strategy that learns with very few videos by isolating the action and by using supervised learning. First, we subtract the background of each frame by modeling each pixel as a mixture of Gaussians(MoG) to concatenate the higher order learning only on the foreground. Next, features are extracted from each frame using a convolutional neural network (CNN) that is trained to classify between normal and abnormal frames. these feature vectors are fed into long short term memory (LSTM) network to learn the long-term dependencies between frames. the LSTM is also trained to classify abnormal frames, while extracting the temporal features of the frames. Finally, we classify the frames as abnormal or normal depending on the output of a linear SVM, whose input are the features computed by the LSTM.
暂无评论