The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive resea...
详细信息
ISBN:
(纸本)9781450397056
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully con...
详细信息
ISBN:
(纸本)9781450397056
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.
The accuracy of skin lesion segmentation is of great significance for the subsequent clinical diagnosis. In order to improve the segmentation accuracy, some pioneering works tried to embed multiple complex modules, or...
详细信息
The accuracy of skin lesion segmentation is of great significance for the subsequent clinical diagnosis. In order to improve the segmentation accuracy, some pioneering works tried to embed multiple complex modules, or used the huge Transformer framework, but due to the limitation of computing resources, these type of large models were not suitable for the actual clinical environment. To address the coexistence challenges of precision and lightweight, we propose a visual saliency guided network (VSGNet) for skin lesion segmentation, which generates saliency images of skin lesions through the efficient attention mechanism of biological vision, and guides the network to quickly locate the target area, so as to solve the localization difficulties in the skin lesion segmentation tasks. VSGNet includes three parts: Color Constancy module, Saliency Detection module and Ultra Lightweight Multi-level Interconnection Network(ULMI-Net). Specially, ULMI-Net uses a U-shaped structure network as the skeleton, including the Adaptive Split Channel Attention (ASCA) module that simulates the parallel mechanism of biological vision dual pathway, and the Channel-Spatial Parallel Attention (CSPA) module inspired by the multi-level interconnection structure of visual cortices. Through these modules, ULMI-Net can balance the efficient extraction and multi-scale fusion of global and local features, and try to achieve the excellent segmentation results at the lowest cost of parameters and computational complexity. To validate the effectiveness and robustness of the proposed VSGNet on three publicly available skin lesion segmentation datasets (ISIC2017, ISIC2018 and PH2 datasets). The experimental results show that compared to other state-of-the-art methods, VSGNet improves the Dice and mIoU metrics by 1.84% and 3.34%, respectively, and with a 196× and 106× reduction in the number of parameters and computational complexity. This paper constructs the VSGNet integrating the biological vision m
Handwritten mathematical expression recognition (HMER) is a challenging task due to the complex two-dimensional structure of mathematical expressions and the similarity of handwritten texts. Most existing methods for ...
详细信息
ISBN:
(纸本)9781450397056
Handwritten mathematical expression recognition (HMER) is a challenging task due to the complex two-dimensional structure of mathematical expressions and the similarity of handwritten texts. Most existing methods for HMER only consider single-scale features while ignoring multi-scale features that are very important to HMER. Few works have explored the fusion of multi-scale features in HMER, but exhibited an extra branch that brings more parameters and computation. In this paper, we propose an end-to-end method to integrate multi-scale features using a unified model. Specifically, we customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture the multi-scale features of the input image meanwhile expanding the receptive fields. Moreover, we added a symbol classifier using focal loss to better discriminate and recognize similar symbols, to further improve the performance of HMER. Experiments on the Competition on recognition of Online Handwritten Mathematical Expressions (CROHME) 2014, 2016 and 2019 shows that the proposed method achieves superior performance to most state-of-the-art methods, demonstrating the effectiveness of the proposed method.
We present a new algorithm based on Dual Graph Contraction (DGC) to transform the Run Graph into its Minimum Line Property Preserving (MLPP) form which, when implemented in parallel, requires O(log(longestcurve)) step...
详细信息
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scen...
详细信息
ISBN:
(纸本)9781450397056
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.
Under some special conditions, the P3P problem can have 1, 2, 3 and 4 solutions, and if the 3 control points and the optical center lie on a circle, the problem is indeterminate. In this paper, by the Monte Carlo appr...
详细信息
Under some special conditions, the P3P problem can have 1, 2, 3 and 4 solutions, and if the 3 control points and the optical center lie on a circle, the problem is indeterminate. In this paper, by the Monte Carlo approach of up to 1 million samples, it is shown that the probabilities of the P3P problem with one solution, two solutions, three solutions, and four solutions are respectively 0.9993, 0.0007, 0.0000, 0.0000. The result confirms the well-known fact that in the most cases, the P3P has a unique solution.
Aiming at the characters of weak and small targets in infrared images, an algorithm based on Least Squares Support Vector Machines (LS-SVM) is presented to fuse long-wave and mid-wave infrared images and detect target...
详细信息
ISBN:
(纸本)9780819469601
Aiming at the characters of weak and small targets in infrared images, an algorithm based on Least Squares Support Vector Machines (LS-SVM) is presented to fuse long-wave and mid-wave infrared images and detect targets. image intensity surfaces for the neighborhood of every pixel of the original long-wave infrared image and mid-wave infrared are well-fitted by mapped LS-SVM respectively. And long-wave and mid-wave infrared image gradient images are obtained by LS-SVM based on radial basis kernels function. Fusion rule is set up according to the features of gradient images. At last, segment fused image and targets can be detected with contrast threshold. Compared with wavelet fusion detection algorithm and morphological fusion detection algorithm, when a target is affected by baits, the experimental results demonstrate that the proposed approach in the paper based on LS-SVM to fuse and detect weak and small target is reliable and efficient.
To simplify the mesh acquired from at hree-dimensional laser scanner, it is more important to keep the boundary and quality of the region of interest than of other regions. The algorithm must not be sensitive to noise...
To simplify the mesh acquired from at hree-dimensional laser scanner, it is more important to keep the boundary and quality of the region of interest than of other regions. The algorithm must not be sensitive to noise introduced in practical applications. In this paper, we present a novel vertex merging mesh simplification algorithm based on region segmentation. The algorithm can be divided into two stages: segmentation and simplification. After the segmentation of the 3D color mesh into different regions, vertices are classed into a region-boundary vertex, which can only be merged into a region-boundary vertex in order to guarantee the completeness of the regions' boundary, and region-inner vertex. The iterative vertex merging is applied with a region-weighted error metric, which implements controllable simplifications. We demonstrate our method with several examples of a 3D color human head mesh.
Semantic segmentation is one of the most important research directions in the field of computer vision, and has a wide range of applications for autonomous driving, medical imaging, intelligent security, etc. Unsuperv...
Semantic segmentation is one of the most important research directions in the field of computer vision, and has a wide range of applications for autonomous driving, medical imaging, intelligent security, etc. Unsupervised domain adaptation is the mainstream research topic in recent years, which can use a large number of labeled source samples to complete the segmentation task in target domain without labeled target samples. In this paper, we propose a prototype-guided unsupervised domain adaptation for semantic segmentation based on ProDA model. Due to lacking of labeled target samples and the prior probability, a prototype distance loss based on target domain is proposed to optimize the distribution of features by measuring the distance between features and the updated prototype and designing an adaptive threshold strategy. Meanwhile, a smoothing loss is proposed to alleviate the impact of source samples on our model and improve the prediction performance of the network. By conducting experiments on the GTA5 to Cityscapes scenarios, the results show that compared with the original model, the loss optimization improves mIoU by1.52.
暂无评论