Most modern approaches for multiple people tracking rely on human appearance to exploit similarity between person detections. In this work, we propose an alternative tracking method that does not depend on visual appe...
详细信息
ISBN:
(纸本)9781728125060
Most modern approaches for multiple people tracking rely on human appearance to exploit similarity between person detections. In this work, we propose an alternative tracking method that does not depend on visual appearance and is still capable to deal with very dynamic motions and long-term occlusions. We make this feasible by: (i) incorporating additional information from body-worn inertial sensors, (ii) designing a neural network to relate person detections to orientation measurements and (iii) formulating a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. We evaluate our approach on several challenging tracking sequences and achieve a very high IDF1 score of 91.2%. We outperform appearance-based baselines in scenarios where appearance is less informative and are on-par in situations with discriminative people appearance.
Voronoi diagrams are highly compact representations that are used in various Graphics applications. In this work, we show how to embed a differentiable version of it - via a novel deep architecture - into a generative...
详细信息
ISBN:
(纸本)9781728193601
Voronoi diagrams are highly compact representations that are used in various Graphics applications. In this work, we show how to embed a differentiable version of it - via a novel deep architecture - into a generative deep network. By doing so, we achieve a highly compact latent embedding that is able to provide much more detailed reconstructions, both in 2D and 3D, for various shapes. In this tech report, we introduce our representation and present a set of preliminary results comparing it with recently proposed implicit occupancy networks.
This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent wi...
详细信息
ISBN:
(纸本)9781665448994
This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent with a constructive proof outline showing that binary neural networks are universal function approximators. 71.24% top 1 accuracy on the 2012 ImageNet validation set was achieved with a 2 step training procedure and implementation strategies optimized for binary operands are provided.
An important goal across most scientific fields is the discovery of causal structures underling a set of observations. Unfortunately, causal discovery methods which are based on correlation or mutual information can o...
详细信息
ISBN:
(纸本)9781665448994
An important goal across most scientific fields is the discovery of causal structures underling a set of observations. Unfortunately, causal discovery methods which are based on correlation or mutual information can often fail to identify causal links in systems which exhibit dynamic relationships. Such dynamic systems (including the famous coupled logistic map) exhibit 'mirage' correlations which appear and disappear depending on the observation window. This means not only that correlation is not causation but, perhaps counter-intuitively, that causation may occur without correlation. In this paper we describe Neural Shadow-Mapping, a neural network based method which embeds high-dimensional video data into a low-dimensional shadow representation, for subsequent estimation of causal links. We demonstrate its performance at discovering causal links from video-representations of dynamic systems.
Previous research on localizing a target region in an image referred to by a natural language expression has occurred within an object-centric paradigm. However, in practice, there may not be any easily named or ident...
详细信息
ISBN:
(纸本)9781728193601
Previous research on localizing a target region in an image referred to by a natural language expression has occurred within an object-centric paradigm. However, in practice, there may not be any easily named or identifiable objects near a target location. Instead, references may need to rely on basic visual attributes, such as color or geometric clues. An expression like "a red something beside a blue vertical line" could still pinpoint a target location. As such, we begin to explore the open challenge of computational object-agnostic reference by constructing a novel dataset and by devising a new set of algorithms that can identify a target region in an image when given a referring expression containing only basic conceptual features.
We define a new representation for immersed surfaces in R-3 by combining the SRNF and the induced surface metric. Using the L-2 metric on the space of SRNFs and the DeWitt metric on the space of surface metrics, we ob...
详细信息
ISBN:
(纸本)9781728193601
We define a new representation for immersed surfaces in R-3 by combining the SRNF and the induced surface metric. Using the L-2 metric on the space of SRNFs and the DeWitt metric on the space of surface metrics, we obtain a 3-parameter family of metrics that corresponds to the family of "elastic metrics" proposed by Jermyn et al. in [19] on the space of immersed surfaces. Similar to the original SRNF representation this new representation results in an extrinsic distance function on the space of immersed surfaces that is easy to compute as it is given by an explicit formula. In addition to avoiding the degeneracy of the SRNF it allows for a data-driven choice of the parameters of the metric, while still providing for fast and accurate registration of surfaces.
Lossy image compression causes a loss of texture, especially at low bitrate. To mitigate this problem, we propose a novel image compression method that utilizes a reference-based image super-resolution model. We use t...
详细信息
ISBN:
(纸本)9781665448994
Lossy image compression causes a loss of texture, especially at low bitrate. To mitigate this problem, we propose a novel image compression method that utilizes a reference-based image super-resolution model. We use two image compression models and a self texture transfer model. The image compression models encode and decode a whole input image and selected reference patches. The reference patches are small but compressed with high quality. The self texture transfer model transfers the texture of reference patches into similar regions in the compressed image. The experimental results show that our method can reconstruct accurate texture by transferring the texture of reference patches.
Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable of manipulating images based on a text prompt, the success of the manipulation...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable of manipulating images based on a text prompt, the success of the manipulation often depends on careful selection of the appropriate text for the desired manipulation. This limitation makes it particularly difficult to perform text-based manipulations in domains where the user lacks expertise, such as fashion. To address this problem, we propose a method for automatically determining the most successful and relevant text-based edits using a pre-trained StyleGAN model. Our approach consists of a novel mechanism that uses CLIP to guide beam-search decoding, and a ranking method that identifies the most relevant and successful edits based on a list of keywords. We also demonstrate the capabilities of our framework in several domains, including fashion.
Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing met...
详细信息
ISBN:
(纸本)9781728193601
Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing methods often introduce some assumptions to simplify the problem setting, which rarely holds in real-world scenarios. In this paper, we formulate a Generalized Class Incremental Learning (GCIL) framework to systematically alleviate these restrictions, and introduce several novel realistic incremental learning scenarios. In addition, we propose a simple yet effective method, namely ReMix, which combines Exemplar Replay (ER) and Mixup to deal with different challenges in realistic GCIL setups. We demonstrate on CIFAR-100 that ReMix outperforms the state-of-the-art methods in different GCIL setups by significant margins without introducing additional computation cost.
暂无评论