Recently, scene Chinese recognition has attracted increasing attention. While mainstream scene text recognition methods exhibit outstanding performance in English recognition, they are considerably limited in Chinese ...
详细信息
Recently, scene Chinese recognition has attracted increasing attention. While mainstream scene text recognition methods exhibit outstanding performance in English recognition, they are considerably limited in Chinese recognition, due to inter-class similarity, intra-class variability, and complex combination of components in scene Chinese text. In this paper, we design Adaptive Position Encoding(APE) to enhance the model's ability to perceive spatial information. Based on APE, we have innovatively designed local attention module (LAM) and Global attentionmodule (GAM). Specifically, LAM captures local features to identify common characteristics among characters of the same category, addressing the issue of intra-class variability. Meanwhile, LAM captures global features to identify the subordination relationships of Chinese character components. By integrating LAM and GAM, combining both local and global features, it is possible to find differences in the details among features that are fundamentally similar, thus solving the problem of inter-class similarity. Further, we contrive the transformer encoder-decoder structure to identify the vast variety of Chinese characters. Based on the local/Global attentionmodule and transformer encoder-decoder framework, we devise the novel sequence- to-sequence local and Global attention Network(LGANet), where both the backbone and the encoder/decoder are composed of attention mechanisms. Subsequent experiments on the Chinese scene dataset show that the recognition accuracy of our proposed LGANet is 77.3% and the normalized editing distance is 88.6%, both of which achieve the SOTA results in Fig. 1.
Computer-assisted medical care can benefit from the lung region segmentation method. Numerous methods provide end-to-end solutions, these methods employ convolution neural networks to segment lung regions from images....
详细信息
Computer-assisted medical care can benefit from the lung region segmentation method. Numerous methods provide end-to-end solutions, these methods employ convolution neural networks to segment lung regions from images. The low contrast, unpredictable appearance, and other problems in medical images have an effect on the accuracy of existing methods. In order to overcome the aforementioned issues, the MSDC (multi-scale dilated convolution) module is added to the short-cut connection, so as to fuse multi-scale features with various receptive fields to obtain more global information of lung area. Moreover, a local attention module which includes channel attention and spatial attention is suggested to give more weight to the lung area to lower the influence of background. Several lung segmentation datasets are employed to evaluate the segmentation performance of images qualitatively and quantitatively. From the experimental results, we can see that the segmentation accuracy of our model outperforms many recent image segmentation methods.
Recognizing occluded facial expressions in the wild poses a significant challenge. However, most previous approaches rely solely on either global or local feature-based methods, leading to the loss of relevant express...
详细信息
Recognizing occluded facial expressions in the wild poses a significant challenge. However, most previous approaches rely solely on either global or local feature-based methods, leading to the loss of relevant expression features. To address these issues, a feature fusion residual attention network (FFRA-Net) is proposed. FFRA-Net consists of a multi-scale module, a local attention module, and a feature fusion module. The multi-scale module divides the intermediate feature map into several sub-feature maps in an equal manner along the channel dimension. Then, a convolution operation is applied to each of these feature maps to obtain diverse global features. The local attention module divides the intermediate feature map into several sub-feature maps along the spatial dimension. Subsequently, a convolution operation is applied to each of these feature maps, resulting in the extraction of local key features through the attention mechanism. The feature fusion module plays a crucial role in integrating global and local expression features while also establishing residual links between inputs and outputs to compensate for the loss of fine-grained features. Last, two occlusion expression datasets (FM_RAF-DB and SG_RAF-DB) were constructed based on the RAF-DB dataset. Extensive experiments demonstrate that the proposed FFRA-Net achieves excellent results on four datasets: FM_RAF-DB, SG_RAF-DB, RAF-DB, and FERPLUS, with accuracies of 77.87%, 79.50%, 88.66%, and 88.97%, respectively. Thus, the approach presented in this paper demonstrates strong applicability in the context of occluded facial expression recognition (FER).
暂无评论