A quick 3D needle segmentation algorithm for 3D US data is described in this paper. The algorithm includes the 3D Quick Randomized Hough Transform (3DGHT), which is based on the 3D Randomized Hough Transform and coars...
详细信息
Semantic segmentation technology based on deep learning has played an important role for doctors in identifying brain tumor regions and formulating treatment plans. Popular automated segmentation methods for brain tum...
ISBN:
(纸本)9781450397056
Semantic segmentation technology based on deep learning has played an important role for doctors in identifying brain tumor regions and formulating treatment plans. Popular automated segmentation methods for brain tumors include 2D and 3D convolution networks. The 3D networks give better results but lead to a significant increase in parameters and computational cost. In this paper, we propose a lightweight brain tumor segmentation network composed of 3D inverted residual modules, which can significantly reduce the computational complexity of 3D models. Based on a lightweight depthwise separable convolution, our 3D inverted residual module extracts high-dimensional brain tumor features through an intermediate expansion layer, thus improving performance. On the brain tumor dataset BraTS 2018, our network achieves dice scores of 80.8%, 90.7%, and 84.3% (for ET, WT, and TC, respectively) with only 0.68M parameters and 51.46G FLOPs. The results show that our method can significantly reduce the complexity of the 3D model and achieve very competitive performance.
Vehicle re-identification (Re-ID) aims to retrieve the target vehicle from a large dataset composed of vehicle images captured by multiple cameras. Most vehicles are difficult to recognize in the environment of low re...
详细信息
ISBN:
(纸本)9781450397056
Vehicle re-identification (Re-ID) aims to retrieve the target vehicle from a large dataset composed of vehicle images captured by multiple cameras. Most vehicles are difficult to recognize in the environment of low resolution, occlusion, and viewpoint change, which brings challenges to vehicle Re-ID. Existing work usually uses additional attribute information to distinguish different vehicles, such as color, viewpoint, and model. However, this requires expensive manual annotation. Therefore, we propose a three-branch network based on attention mechanism and local-global feature association (AM-LGFA) to improve the accuracy of vehicle Re-ID. In the global branch, the global features of the vehicle are extracted. A multi-scale channel attention module is introduced into the attention branch to suppress irrelevant information and extract important channel features. The features extracted from the backbone are divided into different stripe features in the horizontal direction in the local branch. Then connect each stripe feature with the global information to enhance the context between features. Finally, the features extracted from the three branches are concatenated as the feature representation of the test phase. The experimental results show that the features extracted by the AM-LGFA network are complementary. The effectiveness of this method is verified on two challenging public datasets, VehicleID and VeRi-776.
Tracking the same person across multiple cameras is an important task in multi-camera systems. It is also desirable to re-identify the individuals who have been previously seen with a single-camera. This paper address...
详细信息
Tracking the same person across multiple cameras is an important task in multi-camera systems. It is also desirable to re-identify the individuals who have been previously seen with a single-camera. This paper addresses this problem by the re-identification of the same individual in two different datasets, which are both challenging situations from video surveillance system. In this paper, local descriptors are introduced for image description, and support vector machines are employed for high classification performance and so an efficient Bag of Features approach for image presentation. In this way, robustness against low resolution, occlusion and pose, viewpoint and illumination changes is achieved in a very fast way. We get promising results from the evaluation with situations where a number of individuals vary continuously from a multi-camera system.
The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, w...
详细信息
The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive resea...
详细信息
ISBN:
(纸本)9781450397056
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully con...
详细信息
ISBN:
(纸本)9781450397056
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.
The accuracy of skin lesion segmentation is of great significance for the subsequent clinical diagnosis. In order to improve the segmentation accuracy, some pioneering works tried to embed multiple complex modules, or...
详细信息
The accuracy of skin lesion segmentation is of great significance for the subsequent clinical diagnosis. In order to improve the segmentation accuracy, some pioneering works tried to embed multiple complex modules, or used the huge Transformer framework, but due to the limitation of computing resources, these type of large models were not suitable for the actual clinical environment. To address the coexistence challenges of precision and lightweight, we propose a visual saliency guided network (VSGNet) for skin lesion segmentation, which generates saliency images of skin lesions through the efficient attention mechanism of biological vision, and guides the network to quickly locate the target area, so as to solve the localization difficulties in the skin lesion segmentation tasks. VSGNet includes three parts: Color Constancy module, Saliency Detection module and Ultra Lightweight Multi-level Interconnection Network(ULMI-Net). Specially, ULMI-Net uses a U-shaped structure network as the skeleton, including the Adaptive Split Channel Attention (ASCA) module that simulates the parallel mechanism of biological vision dual pathway, and the Channel-Spatial Parallel Attention (CSPA) module inspired by the multi-level interconnection structure of visual cortices. Through these modules, ULMI-Net can balance the efficient extraction and multi-scale fusion of global and local features, and try to achieve the excellent segmentation results at the lowest cost of parameters and computational complexity. To validate the effectiveness and robustness of the proposed VSGNet on three publicly available skin lesion segmentation datasets (ISIC2017, ISIC2018 and PH2 datasets). The experimental results show that compared to other state-of-the-art methods, VSGNet improves the Dice and mIoU metrics by 1.84% and 3.34%, respectively, and with a 196× and 106× reduction in the number of parameters and computational complexity. This paper constructs the VSGNet integrating the biological vision m
We present a new algorithm based on Dual Graph Contraction (DGC) to transform the Run Graph into its Minimum Line Property Preserving (MLPP) form which, when implemented in parallel, requires O(log(longestcurve)) step...
详细信息
Handwritten mathematical expression recognition (HMER) is a challenging task due to the complex two-dimensional structure of mathematical expressions and the similarity of handwritten texts. Most existing methods for ...
详细信息
ISBN:
(纸本)9781450397056
Handwritten mathematical expression recognition (HMER) is a challenging task due to the complex two-dimensional structure of mathematical expressions and the similarity of handwritten texts. Most existing methods for HMER only consider single-scale features while ignoring multi-scale features that are very important to HMER. Few works have explored the fusion of multi-scale features in HMER, but exhibited an extra branch that brings more parameters and computation. In this paper, we propose an end-to-end method to integrate multi-scale features using a unified model. Specifically, we customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture the multi-scale features of the input image meanwhile expanding the receptive fields. Moreover, we added a symbol classifier using focal loss to better discriminate and recognize similar symbols, to further improve the performance of HMER. Experiments on the Competition on recognition of Online Handwritten Mathematical Expressions (CROHME) 2014, 2016 and 2019 shows that the proposed method achieves superior performance to most state-of-the-art methods, demonstrating the effectiveness of the proposed method.
暂无评论