New SOC like the Xilinx Zynq 7045 allow researchers and developers to combine the advantages of writing software for control functionality and having accelerators in the FPGA logic for the number crunching. The dual c...
详细信息
ISBN:
(纸本)9780769549903
New SOC like the Xilinx Zynq 7045 allow researchers and developers to combine the advantages of writing software for control functionality and having accelerators in the FPGA logic for the number crunching. The dual core Cortex-A9 ARM processor runs with up to 1 GHz and the FPGA has up to 900 DSP slices allowing a performance of up to 1,334 GMACs. SCS is porting a lot of algorithms like SGM stereo [1], Stixel clustering or an optical flow [2] to such devices allowing new cars to see their environment and react appropriately. The new developed SCS Zynq 7045 module will allow accelerated development using this technology.
Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets -Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark.
Machine learning models have been successfully applied to a wide range of applications including computervision, natural language processing, and speech recognition. A successful implementation of these models howeve...
详细信息
ISBN:
(纸本)9781665448994
Machine learning models have been successfully applied to a wide range of applications including computervision, natural language processing, and speech recognition. A successful implementation of these models however, usually relies on deep neural networks (DNNs) which are treated as opaque black-box systems due to their incomprehensible complexity and intricate internal mechanism. In this work, we present a novel algorithm for explaining the predictions of a DNN using adversarial machine learning. Our approach identifies the relative importance of input features in relation to the predictions based on the behavior of an adversarial attack on the DNN. Our algorithm has the advantage of being fast, consistent, and easy to implement and interpret. We present our detailed analysis that demonstrates how the behavior of an adversarial attack, given a DNN and a task, stays consistent for any input test data point proving the generality of our approach. Our analysis enables us to produce consistent and efficient explanations. We illustrate the effectiveness of our approach by conducting experiments using a variety of DNNs, tasks, and datasets. Finally, we compare our work with other well-known techniques in the current literature.
e propose a two-level system for apparent age estimation from facial images. Our system first classifies samples into overlapping age groups. Within each group, the apparent age is estimated with local regressors, who...
详细信息
ISBN:
(纸本)9781509014378
e propose a two-level system for apparent age estimation from facial images. Our system first classifies samples into overlapping age groups. Within each group, the apparent age is estimated with local regressors, whose outputs are then fused for the final estimate. We use a deformable parts model based face detector, and features from a pre-trained deep convolutional network. Kernel extreme learning machines are used for classification. We evaluate our system on the ChaLearn Looking at People 2016 - Apparent Age Estimation challenge dataset, and report 0.3740 normal score on the sequestered test set.
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k st...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset [1]) and flying distractors. Object and camera pose, scene lighting, and quantity of objects and distractors were randomized. Each provided view includes RGB, depth, segmentation, and surface normal images, all pixel level. We describe our approach for domain randomization and provide insight into the decisions that produced the dataset.
Recently, contrastive self-supervised learning has become a key component for learning visual representations across many computervision tasks and benchmarks. However, contrastive learning in the context of domain ad...
详细信息
ISBN:
(纸本)9781665448994
Recently, contrastive self-supervised learning has become a key component for learning visual representations across many computervision tasks and benchmarks. However, contrastive learning in the context of domain adaptation remains largely underexplored. In this paper, we propose to extend contrastive learning to a new domain adaptation setting, a particular situation occurring where the similarity is learned and deployed on samples following different probability distributions without access to labels. Contrastive learning learns by comparing and contrasting positive and negative pairs of samples in an unsupervised setting without access to source and target labels. We have developed a variation of a recently proposed contrastive learning framework that helps tackle the domain adaptation problem, further identifying and removing possible negatives similar to the anchor to mitigate the effects of false negatives. Extensive experiments demonstrate that the proposed method adapts well, and improves the performance on the downstream domain adaptation task.
We propose hinge-loss Markov random fields (HL-MRFs), a powerful class of continuous-valued graphical models, for high-level computervision tasks. HL-MRFs are characterized by log-concave density functions, and are a...
详细信息
ISBN:
(纸本)9780769549903
We propose hinge-loss Markov random fields (HL-MRFs), a powerful class of continuous-valued graphical models, for high-level computervision tasks. HL-MRFs are characterized by log-concave density functions, and are able to perform efficient, exact inference. Their templated hinge-loss potential functions naturally encode soft-valued logical rules. Using the declarative modeling language probabilistic soft logic, one can easily define HL-MRFs via familiar constructs from first-order logic. We apply HL-MRFs to the task of activity detection, using principles of collective classification. Our model is simple, intuitive and interpretable. We evaluate our model on two datasets and show that it achieves significant lift over the low-level detectors.
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical i...
详细信息
ISBN:
(纸本)9781424439942
The number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in the medical centers is exponentially growing with the advances in medical imaging technologic Accordingly medical image classification and retrieval has become a popular topic in the recent years. Despite many projects,focusing on this problem, proposed solutions are still far from being sufficiently accurate for real-life implementations. Interpreting medical image classification and retrieval as a multi-class classification task, in this work, we investigate the performance of five different feature types in a SVM-based learning framework-for classification of human body X-Ray images into classes corresponding to body parts. Our comprehensive experiments,show that four conventional feature types provide performances comparable to the literature with low per-class accuracies, whereas local binary patterns produce not only very good global accuracy but also good class-specific accuracies with respect to the features used in the literature.
Predicting outfit compatibility and retrieving complementary items are critical components for a fashion recommendation system. We present a scalable framework, Out-fitTransformer, that learns compatibility of the ent...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Predicting outfit compatibility and retrieving complementary items are critical components for a fashion recommendation system. We present a scalable framework, Out-fitTransformer, that learns compatibility of the entire out-fit and supports large-scale complementary item retrieval. We model outfits as an unordered set of items and leverage self-attention mechanism to learn the relationships between items. We train the framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification. The generated target item embedding is then used to retrieve compatible items that match the outfit. Experimental results demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks.
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In thi...
详细信息
ISBN:
(纸本)9798350365474
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In this paper, we present an efficient online MCT system that tackles these challenges through online processing. Our system leverages memory-efficient accumulated appearance features to provide stable representations of individuals across cameras and time. By incorporating trajectory validation using hierarchical agglomerative clustering (HAC) in overlapping regions, ID transfers are identified and rectified. Evaluation on the 2024 AI City Challenge Track 1 dataset [39] demonstrates the competitive performance of our system, achieving accurate tracking in both overlapping and non-overlapping camera networks. With a 40.3% HOTA score [29], our system ranked 9th in the challenge. The integration of trajectory validation enhances performance by 8% over the baseline, and the accumulated appearance features further contribute to a 17% improvement.
暂无评论