Tracking devices that can track both players and balls are critical to the performance of sports teams. Recently, significant effort has been focused on building larger broadcast sports video datasets. However, broadc...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Tracking devices that can track both players and balls are critical to the performance of sports teams. Recently, significant effort has been focused on building larger broadcast sports video datasets. However, broadcast videos do not show the entire pitch and only provides partial information about the game. On the other hand, other camera perspectives can capture the whole field in a single frame, such as fish-eye and bird-eye view (drone) cameras. Unfortunately, there has not been a dataset where such data has been publicly shared until now. This paper proposes SoccerTrack, a dataset set consisting of GNSS and bounding box tracking data annotated on video captured with a 8K-resolution fish-eye camera and a 4K-resolution drone camera. In addition to a benchmark tracking algorithm, we include code for camera calibration and other preprocessing. Finally, we evaluate the tracking accuracy among a GNSS, fish-eye camera and drone camera data. SoccerTrack is expected to provide a more robust foundation for designing MOT algorithms that are less reliant on visual cues and more reliant on motion analysis.
Innovations in computervision algorithms for satellite image analysis can enable us to explore global challenges such as urbanization and land use change at the planetary level. However, domain shift problems are a c...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Innovations in computervision algorithms for satellite image analysis can enable us to explore global challenges such as urbanization and land use change at the planetary level. However, domain shift problems are a common occurrence when trying to replicate models that drive these analyses to new areas, particularly in the developing world. If a model is trained with imagery and labels from one location, then it usually will not generalize well to new locations where the content of the imagery and data distributions are different. In this work, we consider the setting in which we have a single large satellite imagery scene over which we want to solve an applied problem - building footprint segmentation. Here, we do not necessarily need to worry about creating a model that generalizes past the borders of our scene but can instead train a local model. We show that surprisingly few labels are needed to solve the building segmentation problem with very high-resolution (0.5m/px) satellite imagery with this setting in mind. Our best model trained with just 527 sparse polygon annotations (an equivalent of 1500x1500 densely labeled pixels) has a recall of 0.87 over held out footprints and a R2 of 0.93 on the task of counting the number of buildings in 200x200 meter windows. We apply our models over high-resolution imagery in Amman, Jordan in a case study on urban change detection. [GRAPHICS] .
Single-Image-Super-Resolution (SISR) is a classical computervision problem that has benefited from the recent advancements in deep learning methods, especially the advancements of convolutional neural networks (CNN)....
详细信息
ISBN:
(纸本)9781665487399
Single-Image-Super-Resolution (SISR) is a classical computervision problem that has benefited from the recent advancements in deep learning methods, especially the advancements of convolutional neural networks (CNN). Although state-of-the-art methods improve the performance of SISR on several datasets, direct application of these networks for practical use is still an issue due to heavy computational load. For this purpose, recently, researchers have focused on more efficient and high-performing network structures. Information multi-distilling network (IMDN) is one of the highly efficient SISR networks with high performance and low computational load. IMDN achieves this efficiency with various mechanisms such as Intermediate Information Collection (IIC), working in a global setting, Progressive Refinement Module (PRM), and Contrast Aware Channel Attention (CCA), employed in a local setting. These mechanisms, however, do not equally contribute to the efficiency and performance of IMDN. In this work, we propose the Global Progressive Refinement Module (GPRM) as a less parameter-demanding alternative to the IIC module for feature aggregation. To further decrease the number of parameters and floating point operations per second (FLOPS), we also propose Grouped Information Distilling Blocks (GIDB). Using the proposed structures, we design an efficient SISR network called IMDeception. Experiments reveal that the proposed network performs on par with state-of-the-art models despite having a limited number of parameters and FLOPS. Furthermore, using grouped convolutions as a building block of GIDB increases room for further optimization during deployment. To show its potential, the proposed model was deployed on NVIDIA Jetson Xavier AGX and it has been shown that it can run in real-time on this edge device.
Organ level instance segmentation (e.g., individual leaves) based on computervision techniques is a key step in the measurement of plant phenotypes. Since plant organs, especially leaves, are self-occluded and emerge...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Organ level instance segmentation (e.g., individual leaves) based on computervision techniques is a key step in the measurement of plant phenotypes. Since plant organs, especially leaves, are self-occluded and emerged-occluded, single-view images affect the acquisition of some effective information. However, 3D global images contain much more plant morphological information than single-view images, and it is of great significance for plant phenotype research. In this paper, lettuce was taken as the research object, its 3D point cloud images were obtained and instance segmentation was carried out based on the deep learning method. The result showed that the 3D point cloud of each leaf was segmented and identified accurately. Specifically, we constructed a lettuce point cloud dataset consisting of 620 real and synthetic point clouds and fused them together to train a 3D instance segmentation network-PartNet, which directly takes 3D point clouds as input and its output is the instance segmentation results of leaves. The experimental results showed that, when tested with 40 point clouds in the validation set, the metric Average Precision (%) with IoU threshold being 0.25 reached 97.2%, and with IoU threshold being 0.5 reached 92.4% respectively, indicating that the constructed PartNet network has the potential to accurately segment the 3D point cloud leaf instances for lettuce.
Image anomaly detection aims to detect out-of-distribution instances. Most existing methods treat anomaly detection as an unsupervised task because anomalous training data and labels are usually scarce or unavailable....
详细信息
Task-free continual learning is the subfield of machine learning that focuses on learning online from a stream whose distribution changes continuously over time. In contrast, previous works evaluate task-free continua...
详细信息
Event cameras are a new type of vision sensor that incorporates asynchronous and independent pixels, offering advantages over traditional frame-based cameras such as high dynamic range and minimal motion blur. However...
详细信息
Video surveillance-based automatic detection of motorcycle helmet usage can enhance the effectiveness of educational and enforcement initiatives aimed at boosting road safety. Current detection methods, however, have ...
详细信息
Hyperspectral image (HSI) classification is the most vibrant area of research in the hyperspectral community due to the rich spectral information contained in HSI can greatly aid in identifying objects of interest. Ho...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Hyperspectral image (HSI) classification is the most vibrant area of research in the hyperspectral community due to the rich spectral information contained in HSI can greatly aid in identifying objects of interest. However, inherent non-linearity between materials and the corresponding spectral profiles brings two major challenges in HSI classification: interclass similarity and intraclass variability. Many advanced deep learning methods have attempted to address these issues from the perspective of a region/patch-based approach, instead of a pixel-based alternate. However, the patch-based approaches hypothesize that neighborhood pixels of a target pixel in a fixed spatial window belong to the same class. And this assumption is not always true. To address this problem, we herein propose a new deep learning architecture, namely Gramian Angular Field encoded Neighborhood Attention U-Net (GAF-NAU), for pixel-based HSI classification. The proposed method does not require regions or patches centered around a raw target pixel to perform 2D-CNN based classification, instead, our approach transforms 1D pixel vector in HSI into 2D angular feature space using Gramian Angular Field (GAF) and then embed it to a new neighborhood attention network to suppress irrelevant angular feature while emphasizing on pertinent features useful for HSI classification task. Evaluation results on three publicly available HSI datasets demonstrate the superior performance of the proposed model. The source code available at https://***/MAIN-Lab/GAF-NAU/
Dual-energy X-ray scanners are used for aviation security screening given their capability to discriminate materials inside passenger baggage. To facilitate manual operator inspection, a pseudo-colouring is assigned t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Dual-energy X-ray scanners are used for aviation security screening given their capability to discriminate materials inside passenger baggage. To facilitate manual operator inspection, a pseudo-colouring is assigned to the effective composition of the material. Recently, paired image to image translation models based on conditional Generative Adversarial Networks (cGAN) have shown to be effective for image colourisation. In this work, we investigate the use of such a model to translate from the raw X-ray energy responses (high, low, effective-Z) to the pseudo-coloured images and vice versa. Specifically, given N X-ray modalities, we train a cGAN conditioned in N - m domains to generate the remaining m representation. Our method achieves a mean squared error (MSE) of 16.5 and a structural similarity index (SSIM) of 0.9815 when using the raw modalities to generate the pseudo-colour representation. Additionally, raw X-ray high energy, low energy and effective-Z projections were generated given the pseudo-colour image with minimum MSE of 2.57, 5.63 and 1.43, and maximum SSIM of 0.9953, 0.9901 and 0.9921. Furthermore, we assess the quality of our synthesised pseudo-colour reconstructions by measuring the performance of two object detection models originally trained on real X-ray pseudo-colour images over our generated pseudo-colour images. Interestingly, our generated pseudo-colour images obtain marginally improved detection performance than the corresponding real X-ray pseudo-colour images, showing that meaningful representations are synthesized and that these reconstructions are applicable for differing aviation security tasks.
暂无评论