Convolutional neural networks (CNNs) have achieved impressive success in the multi-modal image processing (MIP) area. However, many existing CNN approaches fuse the features of the target and guidance images only once...
详细信息
Convolutional neural networks (CNNs) have achieved impressive success in the multi-modal image processing (MIP) area. However, many existing CNN approaches fuse the features of the target and guidance images only once, which may cause a loss of information. To alleviate this problem, we present a multi-level bilateral interactive attention network (MBIAN) to fuse the features of the target and guidance images by their progressive interaction at different levels. Concretely, for each level, a bilateral interactive attention block (BIAB) is proposed to fuse the information of target and guidance images and refine their features. As the core component of our BIAB, a novel bilateral interactive attention layer (BIAL) is designed, where target and guidance images can mutually determine the attention weights. In addition, in each BIAB, long and short local shortcuts are employed to further facilitate the flow of information. Numerical experiments are conducted for three different problems, including panchromatic guided multi-spectral image super-resolution, near-infrared guided RGB image denoising, and flash-guided no-flash image denoising. The results demonstrate the versatility and superiority of MBIAN in terms of quantitative metrics and visual inspection, against 14 popular and state-of-the-art methods.
For multi-modal image processing, network interpretability is essential due to the complicated dependency across modalities. Recently, a promising research direction for interpretable network is to incorporate diction...
详细信息
For multi-modal image processing, network interpretability is essential due to the complicated dependency across modalities. Recently, a promising research direction for interpretable network is to incorporate dictionary learning into deep learning through unfolding strategy. However, the existing multi-modal dictionary learning models are both single-layer and single-scale, which restricts the representation ability. In this paper, we first introduce a multi-scale multi-modal convolutional dictionary learning ((MCDL)-C-2) model, which is performed in a multi-layer strategy, to associate different imagemodalities in a coarse-to-fine manner. Then, we propose a unified framework namely DeepM(2)CDL derived from the (MCDL)-C-2 model for both multi-modalimage restoration (MIR) and multi-modalimage fusion (MIF) tasks. The network architecture of DeepM(2)CDL fully matches the optimization steps of the (MCDL)-C-2 model, which makes each network module with good interpretability. Different from handcrafted priors, both the dictionary and sparse feature priors are learned through the network. The performance of the proposed DeepM(2)CDL is evaluated on a wide variety of MIR and MIF tasks, which shows the superiority of it over many state-of-the-art methods both quantitatively and qualitatively. In addition, we also visualize the multi-modal sparse features and dictionary filters learned from the network, which demonstrates the good interpretability of the DeepM(2)CDL network.
Machine Learning has played a major role in various applications including Visual Slam and themal image process. In this paper, we discussed the possibility of generating a thermal map using LWIR images and a deep lea...
详细信息
ISBN:
(纸本)9781510674219;9781510674202
Machine Learning has played a major role in various applications including Visual Slam and themal image process. In this paper, we discussed the possibility of generating a thermal map using LWIR images and a deep learning-based visual slam network and the value that the thermal map can create. We summarized the advantages and applicability of various deep learning-based visual slams and confirmed the results of nice slam, which generates the most curious Dense map. In order to apply Visual SLAM technology, time series, scene repetition, and images from various angles for one scene are required. However, most LWIR data sets consist of one shot for each scene or are unidirectional driving data. To solve this, we created a scenario using the LWIR driving dataset and created a repetitive route through repetition. RGB-Depth SLAM Mapping was performed on the constructed data set, and the results were evaluated and the limitations of the current approach were discussed. Finally, we summarized future directions for creating stable 3D thermal maps in indoor and outdoor environments by resolving the limitations.
This paper focuses on a novel strategy increasing robustness with respect to local optima when using Mutual Information (MI) in multi-modalimage registration. This is realized by integrating additional geometry infor...
详细信息
This paper focuses on a novel strategy increasing robustness with respect to local optima when using Mutual Information (MI) in multi-modalimage registration. This is realized by integrating additional geometry information in the cost function. Hereby, the main innovation is a generalization of multi-metric registration approaches by means of a metric homotopy. Particularly we realize a method which automatically determines the parameters of the metric homotopy. To construct the cost function independent of the choice of the optimizer, the weighting is defined as a function of one of the metrics instead of optimizer steps. In addition, a differentiable cost function is developed. In comparison to the commonly used technique to process an intensity based registration on different resolutions, the proposed method is three times faster with unchanged accuracy. It is also shown that in the presence of large landmark errors the proposed method outperforms an approach in accuracy in which both similarity functionals are applied one after the other. The method is evaluated on 3D multi-modal human brain data sets from the Retrospective image Registration Evaluation Project (RIRE). The evaluation is performed using the evaluation website of the RIRE project to make the registration results of the proposed method easily comparable to other methods. Therefore, the presented results are also available online on the RIRE project page.
暂无评论