Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution...
详细信息
ISBN:
(纸本)9781665448994
Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution of the number of filters in each layer. Neural network models keep a pattern design of increasing filters in deeper layers such as those in LeNet, VGG, ResNet, MobileNet and even in automatic discovered architectures such as NASNet. It remains unknown if this pyramidal distribution of filters is the best for different tasks and constrains. In this work we present a series of modifications in the distribution of filters in three popular neural network models and their effects in accuracy and resource consumption. Results show that by applying this approach, some models improve up to 8.9% in accuracy showing reductions in parameters up to 54%.
Predicting outfit compatibility and retrieving complementary items are critical components for a fashion recommendation system. We present a scalable framework, Out-fitTransformer, that learns compatibility of the ent...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Predicting outfit compatibility and retrieving complementary items are critical components for a fashion recommendation system. We present a scalable framework, Out-fitTransformer, that learns compatibility of the entire out-fit and supports large-scale complementary item retrieval. We model outfits as an unordered set of items and leverage self-attention mechanism to learn the relationships between items. We train the framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification. The generated target item embedding is then used to retrieve compatible items that match the outfit. Experimental results demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks.
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In thi...
详细信息
ISBN:
(纸本)9798350365474
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In this paper, we present an efficient online MCT system that tackles these challenges through online processing. Our system leverages memory-efficient accumulated appearance features to provide stable representations of individuals across cameras and time. By incorporating trajectory validation using hierarchical agglomerative clustering (HAC) in overlapping regions, ID transfers are identified and rectified. Evaluation on the 2024 AI City Challenge Track 1 dataset [39] demonstrates the competitive performance of our system, achieving accurate tracking in both overlapping and non-overlapping camera networks. With a 40.3% HOTA score [29], our system ranked 9th in the challenge. The integration of trajectory validation enhances performance by 8% over the baseline, and the accumulated appearance features further contribute to a 17% improvement.
Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal ...
详细信息
ISBN:
(纸本)9781424439942
Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal segmentation and classification of human motion. These include the large variability in the temporal scale and Periodicity of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations. We provide initial results from investigating two distinct problems - classification of the overall task being performed, and the more difficult problem of classifying individual frames over time into specific actions. We explore first-person sensing through a wearable camera and Inertial Measurement Units (IMUs)for temporally segmenting human motion into actions and performing activity classification in the context of cooking and recipe preparation in a natural environment. We present baseline results for supervised and unsupervised temporal segmentation, and recipe recognition in the CMU-Multimodal activity database (CMU-MMAC).
In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of...
详细信息
ISBN:
(纸本)9781479943098
In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of f-formation, we define with the orientation and distance an inherently social pairwise feature that describes the affinity of a pair of people in the scene. We apply a correlation clustering algorithm that merges pairs of people into socially related groups. Due to the very shifting nature of social interactions and the different meanings that orientations and distances can assume in different contexts, we learn the weight vector of the correlation clustering using Structural SVMs. We extensively test our approach on two publicly available datasets showing encouraging results when detecting groups from first-person camera views.
Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computervision, thanks to recently introduced 3D sensors. In the literature, naive methods simply ...
详细信息
ISBN:
(纸本)9781538607336
Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computervision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for...
详细信息
ISBN:
(纸本)9780769549903
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computervision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets.
In the past decade, there has been a growing need for machine learning and computervision components (segmentation, classification) in the hyperspectral imaging domain. Due to the complexity and size of hyperspectral...
详细信息
ISBN:
(纸本)9781479943098
In the past decade, there has been a growing need for machine learning and computervision components (segmentation, classification) in the hyperspectral imaging domain. Due to the complexity and size of hyperspectral imagery and the enormous number of wavelength channels, the need for combining compact representations with image segmentation and superpixel estimation has emerged in this area. Here, we present an approach to superpixel estimation in hyperspectral images by adapting the well known UCM approach to hyperspectral volumes. This approach benefits from the channel information at each pixel of the hyperspectral image while obtaining a compact representation of the hyperspectral volume using principal component analysis. Our experimental evaluation demonstrates that the additional information of spectral channels will substantially improve superpixel estimation from a single "monochromatic" channel. Furthermore, superpixel estimation performed on the compact hyperspectral representation outperforms the same when executed on the entire volume.
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. Howev...
详细信息
ISBN:
(纸本)9781728125060
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. However, the increased volume of data to be processed brings about additional memory, storage and computational requirements. In order to address such limitations, a wide range of techniques for dimensionality reduction have been introduced by previous work. In this paper, we propose a framework for spectral band selection that is highly data- and computationally efficient. The method leverages a convolutional siamese network learned by optimizing a contrastive loss, and performs band selection based on the low-dimensional data embeddings produced by the network. We empirically demonstrate the efficacy of the method on an object detection task from aerial multispectral imagery. The results show that, in spite of the method's frugality, it produces very competitive band selection results against the evaluated competing techniques.
Semantic Segmentation of satellite images is one of the most challenging problems in computervision as it requires a model capable of capturing both local and global information at each pixel. Current state of the ar...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Semantic Segmentation of satellite images is one of the most challenging problems in computervision as it requires a model capable of capturing both local and global information at each pixel. Current state of the art methods are based on Fully Convolutional Neural Networks (FCNN) with mostly two main components: an encoder which is a pretrained classification model that gradually reduces the input spatial size and a decoder that transforms the encoder's feature map into a predicted mask with the original size. We change this conventional architecture to a model that makes use of full resolution information. NU-Net is a deep FCNN that is able to capture wide field of view global information around each pixel while maintaining localized full resolution information throughout the model. We evaluate our model on the Land Cover Classification and Road Extraction tracks in the DeepGlobe competition.
暂无评论