The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights and inputs. Instead, absolute values of the difference between weights and in...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.
We introduce a lightweight simulation and modeling framework, HMIway-env, for studying human-machine teaming in the context of driving. The goal of the framework is to accelerate the development of adaptive AI systems...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We introduce a lightweight simulation and modeling framework, HMIway-env, for studying human-machine teaming in the context of driving. The goal of the framework is to accelerate the development of adaptive AI systems which can respond to individual driver states, traits, and preferences, by serving as a data-generation engine and training environment for learning personalized human-AI teaming policies. We extend highway-env, an OpenAI Gym-based simulator environment, to enable specification of human driver behavior, and design of vehicle-driver interactions and outcomes. We describe one instance of our framework incorporating models for distracted and cautious driving, which we validate through crowd-sourced feedback, and show early experimental results toward the training of better intervention policies.
In recent years, with the advent of cheap and accurate RGBD (RGB plus Depth) active sensors like the Microsoft Kinect and devices based on time-of-flight (ToF) technology, there has been increasing interest in 3D-base...
详细信息
ISBN:
(纸本)9780769549903
In recent years, with the advent of cheap and accurate RGBD (RGB plus Depth) active sensors like the Microsoft Kinect and devices based on time-of-flight (ToF) technology, there has been increasing interest in 3D-based applications. At the same time, several effective improvements to passive stereo vision algorithms have been proposed in the literature. Despite these facts and the frequent deployment of stereo vision for many research activities, it is often perceived as a bulky and expensive technology not well suited to consumer applications. In this paper, we will review a subset of state-of-the-art stereo vision algorithms that have the potential to fit a target computing architecture based on low-cost field-programmable gate arrays (FPGAs), without additional external devices (e. g., FIFOs, DDR memories, etc.). Mapping these algorithms into a similar low-power, low-cost architecture would make RGBD sensors based on stereo vision suitable to a wider class of application scenarios currently not addressed by this technology.
Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets -Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark.
The Visual Genome Dataset is the de facto standard dataset used in Scene Graph generation. It contains a large collection of images with corresponding object and relationship labels. We explore the lingual aspect of t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The Visual Genome Dataset is the de facto standard dataset used in Scene Graph generation. It contains a large collection of images with corresponding object and relationship labels. We explore the lingual aspect of the relationship predicates and find that very few symmetric/inverse relationships are represented in the dataset(for example, 'above' and 'under'). We believe this is linked to human spatial cognition, and posit that labelling bias stemming from human representations of relationships creates asymmetric relationship labels that span the whole dataset. We also perform a 2D topological analysis of the bounding boxes linked by different relationship predicates. This analysis sheds light on certain classes and their ambiguity wherein more frequent classes are semantically overloaded and therefore quite confusing. Finally we show that when reduced to more lingually and topologically well defined spatial relationships scene graph generation algorithm performance improves tremendously, but scene graph generators are still far from perfect.
Architectures based on siamese networks with triplet loss have shown outstanding performance on the image-based similarity search problem. This approach attempts to discriminate between positive (relevant) and negativ...
详细信息
ISBN:
(纸本)9781665448994
Architectures based on siamese networks with triplet loss have shown outstanding performance on the image-based similarity search problem. This approach attempts to discriminate between positive (relevant) and negative (irrelevant) items. However, it undergoes a critical weakness. Given a query, it cannot discriminate weakly relevant items, for instance, items of the same type but different color or texture as the given query, which could be a serious limitation for many real-world search applications. Therefore, in this work, we present a quadruplet-based architecture that overcomes the aforementioned weakness. Moreover, we present an instance of this quadruplet network, which we call Sketch-QNet, to deal with the color sketch-based image retrieval (CSBIR) problem, achieving new state-of-the-art results.
This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampli...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling. We summarize the techniques we developed for our second place entry in Track 1 (Bicubic Downsampling), seventh place entry in Track 2 (Realistic Adverse Conditions), and seventh place entry in Track 3 (Realistic difficult) in the 2018 NTIRE Super-Resolution Challenge. Furthermore, we present new neural network architectures that specifically address the two challenges listed above: denoising and preservation of large-scale structure.
Interactive substitute recommendation for fashion products improves the online retail customer experience. Traditional fashion search platforms incorporate product metadata between the query products and the products ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Interactive substitute recommendation for fashion products improves the online retail customer experience. Traditional fashion search platforms incorporate product metadata between the query products and the products to be retrieved. In this paper, we propose DAtRNet, an attribute representation network to disentangle the features in the query product. It is used to recommend attribute-aware substitute items based on the conditional similarity of the retrieved products. The proposed architecture relies on attribute-level similarity providing a fine-grained recommendation. In addition, a concurrent axial attention mechanism is proposed that generates global information embedding and adaptively re-calibrates the soft attention masks. Overall, the end-to-end framework enables the system to disentangle the attribute features and independently deals with them to enhance its flexibility towards one or multiple attributes. The proposed method outperforms the state-ofthe-art by a significant margin.
This work focuses on improving the Conv-GRU-based optical flow update within a DROID-SLAM framework. Prior optical flow models typically follow a UNet or coarse-to-fine architecture in order to extract long-range cros...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This work focuses on improving the Conv-GRU-based optical flow update within a DROID-SLAM framework. Prior optical flow models typically follow a UNet or coarse-to-fine architecture in order to extract long-range cross-correlation and context cues. This helps flow estimation in the presence of large motion and challenging image regions, e.g., textureless regions. We propose modifications to the Conv-GRU module which follows the rationale of these prior models by integrating (Atrous) Spatial Pyramid Pooling and global self-attention into the Conv-GRU block. By enlarging the receptive field through the aforementioned modifications, the model is able to integrate information from a larger context window, thus improving the robustness even when given inputs that comprise challenging image regions. We show empirically through extensive experiments the gain in accuracy through these modifications.
暂无评论