The detecting arbitrary shape text is a challenging task due to the significant variation in text shape, size, and aspect ratio, as well as the complexity of scene backgrounds. The enhancing feature extraction capabil...
详细信息
The detecting arbitrary shape text is a challenging task due to the significant variation in text shape, size, and aspect ratio, as well as the complexity of scene backgrounds. The enhancing feature extraction capabilities is essential for the boosting text detection accuracy. However, traditional text feature extraction methods face several issues, including insufficient multiscale feature fusion, limited information transfer between different feature levels, and constrained receptive field expansion when using asymmetricconvolutional kernels for long text detection. To address these challenges, this article introduces an arbitrarily shaped scene text detector called the semantic-information space sharing interaction network (S3INet). The proposed network leverages the semantic-information space sharing module (S3M) to generate a single-level feature map capable of capturing multiscale features with rich semantic information and prominent foreground elements. In addition, we propose the multibranch parallel asymmetric convolutional module (MPACM) group to enhance the representation of text features, thereby further enhancing text detection performance. Extensive experimental evaluations on five publicly available natural scene text datasets (CTW-1500, Total-Text, MSRA-TD500, ICDAR2015, and ICDAR2017-MLT) and two traffic text datasets (CTST-1600 and TPD) demonstrate the superiority of our method. The results indicate that S3INet significantly outperforms most existing state-of-the-art methods in both accuracy and robustness. The code will be released at: https://***/runminwang/S3INet.
A novel neural network architecture, BLSTM-Inception v1, is proposed for text classification. It mainly consists of the BLSTM-Inception module, which has two parts, and a global max pooling layer. In the first part, f...
详细信息
ISBN:
(纸本)9781538633540
A novel neural network architecture, BLSTM-Inception v1, is proposed for text classification. It mainly consists of the BLSTM-Inception module, which has two parts, and a global max pooling layer. In the first part, forward and backward sequences of hidden states of BLSTM are concatenated as double channels, rather than added as single channel. The second part contains parallel asymmetric convolutions of different scales to extract nonlinear features of multi-granular n-gram phrases from double channels. The global max pooling is used to convert variable-length text into a fixed-length vector. The proposed architecture achieves excellent results on four text classification tasks, including sentiment classifications, subjectivity classification, and especially improves nearly 1.5% on sentence polarity dataset from Pang and Lee compared to BLSTM-2DCNN.
暂无评论