As one of the extremely important components on the transmission tower, the insulator has two functions of electrical insulation and wire fixing, which directly affects the operation of the power system. Defects in in...
详细信息
A B S T R A C TTo solve the visual semantic understanding bias and multimodal semantic bias in multimodal named entity recognition, the Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition (...
详细信息
In recent years, a series of methods have been proposed to use image semantics to assist in extracting named entities. However, in these multi-modal named entity recognition methods, there are problems of visual seman...
详细信息
Transformer has become a widely used deep learning model in computer Vision applications, alongside Convolutional Neural Networks. Its ability to capture long-term dependencies through self-attention mechanism has mad...
详细信息
Background: It is very important to detect mandibular fracture region. However, the size of mandibular fracture region is different due to different anatomical positions, different sites and different degrees of force...
详细信息
Background: It is very important to detect mandibular fracture region. However, the size of mandibular fracture region is different due to different anatomical positions, different sites and different degrees of force. It is difficult to locate and recognize fracture region accurately. Methods: To solve these problems, M3YOLOv5 model is proposed in this paper. Three feature enhancement strategies are designed, which improve the ability of model to locate and recognize mandibular fracture region. Firstly, Global-Local Feature Extraction Module (GLFEM) is designed. By effectively combining Convolutional Neural Network (CNN) and Transformer, the problem of insufficient global information extraction ability of CNN is complemented, and the positioning ability of the model to the fracture region is improved. Secondly, in order to improve the interaction ability of context information, Deep-Shallow Feature Interaction Module (DSFIM) is designed. In this module, the spatial information in the shallow feature layer is embedded to the deep feature layer by the spatial attention mechanism, and the semantic information in the deep feature layer is embedded to the shallow feature layer by the channel attention mechanism. The fracture region recognition ability of the model is improved. Finally, Multi-scale Multi receptive-field Feature Mixing Module (MMFMM) is designed. Deep separate convolution chains are used in this modal, which is composed by multiple layers of different scales and different dilation coefficients. This method provides richer receptive field for the model, and the ability to detect fracture region of different scales is improved. Results: The precision rate, mAP value, recall rate and F1 value of M3YOLOv5 model on mandibular fracture CT data set are 97.18%, 96.86%, 94.42% and 95.58% respectively. The experimental results show that there is better performance about M3YOLOv5 model than the mainstream detection models. Conclusion: The M3YOLOv5 model can effectiv
In this paper, we present a structured literature mapping of the state-of-the-art of vehicular perception methods and approaches using inertial sensors. An in-depth investigation and classification were performed empl...
详细信息
Sketch-based 3D shape retrieval has always been a hot research topic in the computer vision community. The main challenge is to alleviate the cross-modality discrepancies such that the retrieval accuracy can be improv...
详细信息
Background and ObjectiveThe instance segmentation of impacted tooth in the oral panoramic X-ray images is research hot. However, impacted tooth in panoramic X-Ray images lead to teeth deformities, low contrast between...
详细信息
A B S T R A C Timage fusion is a research hotspot in the field of computer vision, which obtains complementary information from the different modal images and then presents it in a comprehensive, highly information-co...
详细信息
A B S T R A C Timage fusion is a research hotspot in the field of computer vision, which obtains complementary information from the different modal images and then presents it in a comprehensive, highly information-cohesive fused image. Encoder-decoder networks are an important technical method in the field of image fusion. This paper reviewed the research status and development of the Encoder-decoder networks in the field of image fusion from the following aspects: Firstly, the18 image fusion-related datasets and the 22 open-source codes of deep learning-based image fusion are summarized. Secondly, the model structure of encoder-decoder networks and the basic principles for image fusion based on the encoder-decoder network are described. Thirdly, encoder-decoder networks fusion models are summarized into 4 aspects: Convolutional Auto-encoder network(CAE Net), Sparse Auto-encoder network(SAE Net), U-shape Encoder-decoder network(U-net) and Variational Auto-encoder network(VAE Net),in U-net, there are 5 types: U-net incorporating residual connection mechanism (Res U-net), U-net incorporating dense connection mechanism (Dens U-net), U-net incorporating multiscale connection mechanism (MS U-net), U-net incorporating attention mechanism (Att U-net) and U-shaped Encoder-decoder networks(U-net) combined with Generative Adversarial Network(GAN U-net). Fourthly, the existing image fusion evaluation metrics are summarized from both subjective and objective aspects. Fifthly, the application of Encoder-decoder network in 5 kinds of image fusion, such as multi-exposure image fusion, multi-focus image fusion, infrared and visible image fusion, medical image fusion and remote sensing image fusion are summarized. Finally, there is a discussion of the main challenges faced by encoder-decoder networks in the image fusion domain and an outlook on future directions. This paper systematically analyzed the application of encoder-decoder networks in the image fusion areas, which has posi
Chinese named entity recognition (CNER) in the judicial domain is an important basic task for intelligent analysis and processing of massive documents. This domain entity has more complicated structure than the common...
详细信息
暂无评论