Recent 3D room layout recovery approaches mostly concentrate on Manhattan layouts, where the vertical walls are orthogonal with respect to each other, even though there are many rooms with non-Manhattan layouts in the...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent 3D room layout recovery approaches mostly concentrate on Manhattan layouts, where the vertical walls are orthogonal with respect to each other, even though there are many rooms with non-Manhattan layouts in the real world. This paper presents a room layout recovery method generalizing across Manhattan and non-Manhattan worlds. Without introducing additional supervision, we extend current Manhattan layout recovery methods by predicting an extra surface normal feature, which is further used for an adaptive post-processing to reconstruct layouts of arbitrary shapes. Experimental results show that our method has a great improvement on non-Manhattan layouts while being capable of generalizing across Manhattan and non-Manhattan layouts.
In this paper, we present a super-resolution-based video coding scheme that compresses video data by combining traditional hybrid video coding and Convolutional neural network-based video coding. During video encoding...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we present a super-resolution-based video coding scheme that compresses video data by combining traditional hybrid video coding and Convolutional neural network-based video coding. During video encoding, downsampling reduces the resolution of an original video in both horizontal and vertical directions to reduce original video data, and Convolutional neural networkbased super-resolution is employed after the decoding process to recover the resolution of the reconstructed video during upsampling. For core encoding and decoding processes, the latest video coding standard (i.e., VVC/H.266) is conducted. The experimental results show that the proposed method can provide efficient coding performance while maintaining good visual quality.
Majority of CNN architecture design is aimed at achieving high accuracy in public benchmarks by increasing the complexity. Typically, they are over-specified by a large margin and can be optimized by a factor of 10-10...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Majority of CNN architecture design is aimed at achieving high accuracy in public benchmarks by increasing the complexity. Typically, they are over-specified by a large margin and can be optimized by a factor of 10-100x with only a small reduction in accuracy. In spite of the increase in computational power of embedded systems, these networks are still not suitable for embedded deployment. There is a large need to optimize for hardware and reduce the size of the network by orders of magnitude for computervision applications. This has led to a growing community which is focused on designing efficient networks. However, CNN architectures are evolving rapidly and efficient architectures seem to lag behind. There is also a gap in understanding the hardware architecture details and incorporating it into the network design. The motivation of this paper is to systematically summarize efficient design techniques and provide guidelines for an application developer. We also perform a case study by benchmarking various semantic segmentation algorithms for autonomous driving.
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric...
详细信息
ISBN:
(纸本)9781665448994
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric under epipolar constraint, in this paper, we improve the performance of stereo image SR by exploiting symmetry cues in stereo image pairs. Specifically, we propose a symmetric bi-directional parallax attention module (biPAM) and an inline occlusion handling scheme to effectively interact cross-view information. Then, we design a Siamese network equipped with a biPAM to super-resolve both sides of views in a highly symmetric manner. Finally, we design several illuminance-robust losses to enhance stereo consistency. Experiments on four public datasets demonstrate the superior performance of our method.
We present a self-supervised approach to recolorization of images from design-oriented domains. Our approach can recolor images based on image exemplars or target color palettes provided by a user. In contrast with pr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a self-supervised approach to recolorization of images from design-oriented domains. Our approach can recolor images based on image exemplars or target color palettes provided by a user. In contrast with previous approaches, our method can reproduce color palettes with luminance distributions that differ significantly from input, and our method is the first palette-based approach to distinguish between recolorings that match reflectance and those that match illumination, making it particularly well-suited to visualizing different aesthetic decisions in design applications. The key to our approach is first to learn latent representations for texture and color in a setting where self-supervision is especially straightforward, and then to learn a mapping to our color representation from input color palettes and scene illumination, which offers a more intuitive space for controlling and exploring recolorization.
We propose SymDNN, a Deep Neural Network (DNN) inference scheme, to segment an input image into small patches, replace those patches with representative symbols, and use the reconstructed image for CNN inference. This...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose SymDNN, a Deep Neural Network (DNN) inference scheme, to segment an input image into small patches, replace those patches with representative symbols, and use the reconstructed image for CNN inference. This approach of deconstruction of images, and the reconstruction from cluster centroids trained on clean images, enhances robustness against adversarial attacks. The input transform used in SymDNN is learned from very large datasets, making it difficult to approximate for adaptive adversarial attacks. For example, SymDNN achieves 23% and 42% robust accuracy at L-infinity attack strengths of 8/255 and 4/255 respectively, against BPDA under a complete white box setting, where most input processing based defenses break completely. SymDNN is not a future-proof adversarial defense that can defend any attack, but it is one of the few readily usable defenses in resource-limited embedded systems that defends against a wide range of attacks.
The recent success of self-supervised learning relies on its ability to learn the representations from self-defined pseudo-labels that are applied to several downstream tasks. Motivated by this ability, we present a d...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The recent success of self-supervised learning relies on its ability to learn the representations from self-defined pseudo-labels that are applied to several downstream tasks. Motivated by this ability, we present a deep image compression technique, which learns the lossy reconstruction of raw images from the self-supervised learned representation of SimCLR ResNet-50 architecture. Our framework uses a feature pyramid to achieve the variable rate compression of the image using a self-attention map for the optimal allocation of bits. The paper provides an overview to observe the effects of contrastive self-supervised representations and the self-attention map on the distortion and perceptual quality of the reconstructed image. The experiments are performed on a different class of images to show that the proposed method outperforms the other variable rate deep compression models without compromising the perceptual quality of the images.
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the other hand, Driver Activity recognition (DAR) is significantly more difficult since there is a much finer distinction between various actions. Moreover, training robust DAR models requires diverse training data from multiple sources, which might not be feasible for a centralized setup due to privacy and security concerns. Furthermore, it is critical to develop efficient models due to limited computational resources available on vehicular edge devices. Federated Learning (FL), which allows data parties to collaborate on machine learning models while preserving data privacy and reducing communication requirements, can be used to overcome these challenges. Despite significant progress on various computervision tasks, FL for DAR has been largely unexplored. In this work, we propose an FL-based DAR model and extensively benchmark the model performance on two datasets under various practical setups. Our results indicate that the proposed approach performs competitively under the centralized (non-FL) and decentralized (FL) settings.
This paper describes an architecture framework using heterogeneous hardware accelerators for embedded vision applications. This approach leverages the recent single-chip heterogeneous FPGAs that combine powerful multi...
详细信息
ISBN:
(纸本)9780769549903
This paper describes an architecture framework using heterogeneous hardware accelerators for embedded vision applications. This approach leverages the recent single-chip heterogeneous FPGAs that combine powerful multicore processors with extensive programmable gate array fabric on the same die. We present a framework using an extensive library of pipelined real time vision hardware accelerators and a service-based software architecture. This field-proven system design approach provides embedded vision developers with a powerful software abstraction layer for rapidly and efficiently integrating any of hardware accelerators for applications such as image stabilization, moving target indication, contrast normalization enhancement, and others. The framework allows the service-based software to take advantage of the hardware acceleration blocks available and perform the remainder of the processing in software. As performance requirements increase, more hardware acceleration can be added to the FPGA fabric, thus offloading the main processor.
暂无评论