Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes a...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes and eliminate local artifacts such as holes, pits, and hills. In this paper, we propose FAWN, a modification of truncated signed distance function (TSDF) reconstruction methods, which considers scene structure by detecting walls and floor in a scene, and penalizing the corresponding surface normals for deviating from the horizontal and vertical directions. Implemented as a 3D sparse convolutional module, FAWN can be incorporated into any trainable pipeline that predicts TSDF. Since FAWN requires 3D semantics only for training, no additional limitations on further use are imposed. We demonstrate, that FAWN-modified methods use semantics more effectively, than existing semantic-based approaches. Besides, we apply our modification to state-of-the-art TSDF reconstruction methods, and demonstrate a quality gain in SCANNET, ICL-NUIM, TUM RGBD, and 7SCENES benchmarks.
Visual data transmission, such as videos and images, is a typical task in wireless communications. The semantic communication paradigm compresses a large amount of visual data at the semantic level to enable its trans...
详细信息
ISBN:
(纸本)9798350372267;9798350372250
Visual data transmission, such as videos and images, is a typical task in wireless communications. The semantic communication paradigm compresses a large amount of visual data at the semantic level to enable its transmission with limited channel capacity. To improve semantic-aware image compression, we develop a Deep neural Network (DNN)-based architecture for semantic communication. An image is segmented into Regions of Interest (ROI) and Regions of Non-Interest (RONI) parts using semantic segmentation to obtain an ROI mask. The ROI and RONI segments are then independently compressed with adjustable compression ratios for wireless transmission. The reconstructed image is a fusion of the decompressed ROI and RONI segments. The proposed architecture improves the perceptual quality of transmitted images by allocating more bandwidth to ROI parts containing more semantic information compared to RONI parts. The compression ratio of the architecture is adjustable to adapt to time-varying channel capacity. Critical semantic information within ROI is compressed and transmitted independently to ensure accurate transmission even if the transmission of RONI fails. Additionally, a ROI mask compressor is included to minimize the extra bandwidth caused by the irregular contours of ROI masks. Simulation results on fading channels show that the proposed system outperforms existing methods in terms of peak signal-to-noise ratio (PSNR) scores.
Deep neural Networks (DNNs) are known to be vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to clean examples. With the wide applications of DNNs to Synthetic Aperture Radar...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Deep neural Networks (DNNs) are known to be vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to clean examples. With the wide applications of DNNs to Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), the vulnerability of SAR deep recognition models has attracted increasing attention. Existing works show that input transformation can effectively improve the black-box attack performance of adversarial examples, but there is little work in the field of SAR-ATR. In this paper, we propose a novel input transformation attack called image Mixing and Gradient Smoothing (IMGS), which is dedicated to attacking SAR images. IMGS mixes a small portion of another image into the input samples in amplitude and phase with different rates and uses the Local Mean Square Error (LMSE) filter to smooth the gradient. Extensive experiments conducted on the MSTAR dataset demonstrate that IMGS significantly outperforms other input transformation methods, originally designed for attacking visual images, in both white-box and black-box settings. The code is available at https://***/JHL-HUST/IMGS.
Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provi...
详细信息
Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary work and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Looking for self-similarity between different parts of a function allows us to implement a very small subset of core unique subfunctions and derive the rest of the subfunctions from this core using simple linear transformations. We compare our method to previous works such as FloPoCo-LUT (lookup table), HBU (hybrid binary-unary) and FloPoCo-PPA (piecewise polynomial approximation) on several 8-12-bit nonlinear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are frequently used in neural networks and imageprocessing applications. The area x delay hardware cost of our method is on average 32%-60% better than previous methods in both exact and approximate implementations. We also extend our method to multivariate nonlinear functions and show on average 78%-92% improvement over previous work.
The symbiotic use of logarithmic approximation in floating-point (FP) multiplication can significantly reduce the hardware complexity of a multiplier. However, it is difficult for a limited number of logarithmic FP mu...
详细信息
The symbiotic use of logarithmic approximation in floating-point (FP) multiplication can significantly reduce the hardware complexity of a multiplier. However, it is difficult for a limited number of logarithmic FP multipliers (LFPMs) to fit in a specific error-tolerant application, such as neural networks (NNs) and digital signalprocessing, due to their unique error characteristics. This article proposes a design framework for generating LFPMs. We consider two FP representation formats with different ranges of mantissas, the IEEE 754 Standard FP Format and the Nearest Power of Two FP Format. For both logarithm and anti-logarithm computation, the applicable regions of inputs are first evenly divided into several intervals, and then approximation methods with negative or positive errors are developed for each sub-region. By using piece-wise functions, different configurations of approximation methods throughout applicable regions are created, leading to LFPMs with various trade-offs between accuracy and hardware cost. The variety of error characteristics of LFPMs is discussed and the generic hardware implementation is illustrated. As case studies, two LFPM designs are presented and evaluated in applications of JPEG compression and NNs. They do not only increase the classification accuracy, but also achieve smaller PDPs compared to the exact FP multiplier, while being more accurate than a recent logarithmic FP design.
The rapid development of deep learning has driven the breakthrough in performance of single image super-resolution (SISR). However, many existing works deepen the network to pursue performance improvement without cons...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The rapid development of deep learning has driven the breakthrough in performance of single image super-resolution (SISR). However, many existing works deepen the network to pursue performance improvement without considering the issue that models with large parameters are not conducive to current production and deployment. Meanwhile, Transformer, relying on its ability to model long-term dependencies, has entered the field of SISR, but the large memory consumption and inference time cannot avoid the abovementioned problem. In this paper, we propose a Hybrid Convolution-Transformer (HCFormer) for lightweight single image super-resolution. HCFormer effectively combines convolution and Transformer, and its core modules are the super-resolution feature extraction module (SRFEM) and the long-term dependency feature representation module (LDFRM), respectively composed of a series of light-weight and efficient convolution blocks (LECB) and light-weight and efficient Transformer blocks (LETB). LECB excavates the potential super-resolution features in the input image through multi-scale residual convolutional operations, while LETB performs long-term dependency feature representation on the excavated features through a streamlined and improved Transformer. Extensive experimental results on five benchmark datasets, compared with the state-of-the-art light-weight SISR methods, demonstrate the effectiveness and competitiveness of our proposed method.
Conditional normalizing flow (CNF) performs a series of reversible transformations to learn the distribution of the normal-light image guided by conditional features from the low-light image, providing a novel solutio...
详细信息
ISBN:
(纸本)9798350350920
Conditional normalizing flow (CNF) performs a series of reversible transformations to learn the distribution of the normal-light image guided by conditional features from the low-light image, providing a novel solution for low-light image enhancement. However, most existing CNF-based methods completely adopt convolutional neural networks (CNN) to extract conditional features, which only concentrate on the representation of local information. Besides, the invertible network in CNF executes some reversible transformations that only act on the part of the features, affecting the expressive power of the CNF. To tackle the issues that exist in CNF, this paper proposes a novel and powerful CNF-based method named multiscale local-global features guided normalizing flow (MLGFlow) for low-light image enhancement. Specifically, MLGFlow consists of a conditional encoder and an invertible network. In the conditional encoder, we design the multiscale local-global learning block (MLGB) that includes a dual-branch extraction module (DEM) and an attention-based fusion module (AFM) for extracting informative conditional features. DEM concentrates on capturing multiscale local and global features and AFM further promotes feature fusion based on the channel-spatial attention mechanism. In the invertible network, we construct the conditional multi-affine coupling (CMAC) layer to perform sufficient reversible transformation for enhancing the expressive power of the model. Extensive experiments demonstrate that our proposed MLGFlow performs better than current state-of-the-art (SOTA) methods in terms of quantitative evaluation and visual quality.
The synchronous detection of visual features of small- and wide-field moving targets in complex dynamic environments has been a challenge in the field of moving target detection. Fortunately, the visual system of Dros...
详细信息
The synchronous detection of visual features of small- and wide-field moving targets in complex dynamic environments has been a challenge in the field of moving target detection. Fortunately, the visual system of Drosophila flies can detect visual features of small- and wide-field moving targets synchronously from complex dynamic environments, thus providing a good paradigm for the synchronous detection of visual features of small- and wide-field moving targets in complex dynamic environments, however, there is little literature that comprehensively analyses and verify this. In this paper, we present a bio-inspired computing model for detecting visual features of small- and wide-field moving targets synchronously. The model consists of three stages. First, visual stimuli are perceived and divided into parallel ON and OFF pathways. Then, the feedback mechanism and the full Hassenstein-Reichardt correlator are applied to the Medulla neurons. Finally, the Lobula Columnar 11 is used to detect visual features of small-field moving targets, i.e., the position, meanwhile, the Lobula Plate Tangential Cell is utilized to detect visual features of wide-field moving targets, i.e., the translational directional selectivity. Through extensive experiments, the proposed model can detect visual features of small- and wide-field moving targets synchronously. In addition, the proposed model improves the detection rate in small-field moving target detection by 17.18% compared with the traditional bio-inspired computing model, while the effectiveness of the proposed model is further verified by comparing it with the conventional moving target detection methods. Moreover, the proposed model can also effectively detect visual features of wide-field moving targets. The source code can be found at https://***/szhanghh/A-bio-inspired-visual-neural-computing-model.
The research of visibility detection in foggy days is of great significance to both road traffic and air transport safety. Based on the meteorological and video data collected from an airport, a deep Recurrent neural ...
详细信息
The research of visibility detection in foggy days is of great significance to both road traffic and air transport safety. Based on the meteorological and video data collected from an airport, a deep Recurrent neural Network (RNN) model was established in this study to predict the visibility. First, the Fourier Transform was used to extract feature variables from video data. Then, the Principal Component Analysis method was used to reduce the dimension of features. After that, 462 sets of sample data include image features, air pressure, temperature and wind speed, were used as inputs to train the RNN model. By comparing the predicted results with the actual visibility data as well as some other state-of-the-art methods, it can be found that the proposed model makes up for the deficiency of models based only on meteorological or image data, and has higher accuracy in different grades of visibility. With considering the meteorological data, the accuracy of RNN model is improved by 18.78%. Besides, with aids of correlation analysis, the influence of the meteorological factors on the predicted visibility was analysed, for fog at night, temperature is the dominant factor affecting visibility.
In reversible data hiding (RDH) community, researchers often train the CNN-based predictors with the Mean Square Error (MSE) loss function to evaluate the differences between original and predicted images. This will m...
详细信息
In reversible data hiding (RDH) community, researchers often train the CNN-based predictors with the Mean Square Error (MSE) loss function to evaluate the differences between original and predicted images. This will make the prediction network parameters optimized for all pixels without difference. Considering that the prediction errors in smooth areas are prioritized from the prediction error set for reversible data hiding, in this letter we propose to apply a smoothness factor into the MSE loss function. The smoothness factor used to evaluate the pixel smoothness of an image in steganography is adopted as the loss weight in the new loss function, corresponding to large values in the smooth areas and small values in the texture areas. Experimental results have shown that the CNN-based predictors trained with the proposed loss function can predict pixels more accurately in the smooth areas than using the original loss function. As a bonus, better embedding performance can be achieved by comparing with recent typical CNN-based RDH methods.
暂无评论