检索结果-内蒙古大学图书馆

5th International Conference on computer Information Science and Application Technology, CISAT 2022

作者： Huang, Wenming Li, Tingting Xiao, Yannan Wen, Yayuan Deng, Zhenrong School of Computer and Information Security Guilin University of Electronic Technology Guangxi Guilin China Guangxi Key Laboratory of Image and Graphics Intelligent Processing Guangxi Guilin China College of Electronic Engineering Guangxi Normal University Guangxi Guilin China

ISBN: (纸本)9781510660076

As one of the extremely important components on the transmission tower, the insulator has two functions of electrical insulation and wire fixing, which directly affects the operation of the power system. Defects in insulators can impair the service life of transmission lines. UAV aerial photography of electric power towers has problems such as small number of defective insulator samples, small area, large aspect ratio of insulator strings, and variable inclination angle, coupled with the influence of environmental factors such as light, interference, distance, etc., which lead to low detection accuracy of insulator defects. Aiming at the above problems, an improved YOLOv5 insulator defect detection algorithm is proposed. First, screen the aerial images and use data augmentation to obtain a sufficient number of defective insulator images to enrich the dataset and avoid model overfitting. Secondly, the convolutional attention module CBAM is introduced to improve the expression ability of defect insulator features and strengthen the network's ability to identify targets. Finally, the Leaky ReLU activation function of the hidden layer of the original YOLOv5 algorithm is replaced by the Mish function to improve the generalization ability of the network. The experimental results show that compared with the original YOLOv5 algorithm, the average precision mAP (IOU=0.5) of the improved algorithm is increased by 7.8%, which effectively improves the problems of false detection and missed detection in the original algorithm. Compared with other mainstream object detection algorithms, the algorithm proposed in this paper has better detection effect on insulator defects. © 2022 SPIE.

关键词： Defects

来源：评论

学校读者我要写书评

暂无评论

Clglf: Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition Method

SSRN

引用

SSRN 2024年

作者： Wang, Hairong Wang, Tong Wang, Yiyan Chen, Fangping School of Computer Science and Engineering North Minzu University Yinchuan750021 China Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission North Minzu University Yinchuan750021 China

A B S T R A C TTo solve the visual semantic understanding bias and multimodal semantic bias in multimodal named entity recognition, the Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition (CLGLF) method is proposed. This method invokes the BLIP-2 pre-trained model to generate image captions, concatenates them with the input texts, and performs joint coding to achieve multimodal feature fusion. The candidate labels and text labels are obtained after decoding the multimodal representations and text representations. Based on using the KL divergence loss function to align the two groups of labels, the confidence score is calculated to evaluate the quality of the multimodal representation, and a confidence threshold is set to help screen out the biased candidate labels, the text labels in the corresponding positions are used to replace the biased candidate labels, to achieve the label fusion, and finally complete the multimodal named entity recognition. In order to verify the proposed method, experiments are carried out on the Twitter-2015 and Twitter-2017 multimodal datasets, and the experimental results are compared with 7 mainstream methods, such as MSB and UMT. The experimental results show the effectiveness of the CLGLF method. © 2024, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Visual Semantic Enhancement for Multi-Modal Named Entity Recognition

SSRN

引用

SSRN 2024年

作者： Wang, Tong Wang, Hairong Xu, Xi Chen, Fangping School of Computer Science and Engineering North Minzu University Yinchuan750021 China Key Laboratory of Image and Graphics Intelligent Processing State Ethnic Affairs Commission North Minzu University Yinchuan750021 China

In recent years, a series of methods have been proposed to use image semantics to assist in extracting named entities. However, in these multi-modal named entity recognition methods, there are problems of visual semantic missing and weak semantic constraints in multi-modal representation. Therefore, a visual semantic enhancement for multi-modal named entity recognition (EVS-MNER) method is proposed. In this method, multiple visual features are used for collaborative representation to obtain more complete visual semantics. The semantic information in multiple visual features is integrated through the multi-modal features fusion module to generate visual semantic features, and the semantics are passed to the text features to obtain semantically enhanced multi-modal text representation. The visual entity classifier is used to decode visual semantic features, and the semantic consistency constraint of visual features is realized. The multi-task label decoder is called to mine the fine-grained semantics of the multi-modal text representation and text features, and the problem of semantic bias is solved by joint decoding, to further improve the accuracy of named entity recognition. On two multimodal entity recognition datasets of Twitter-2015 and Twitter-2017, the proposed method is compared with 10 methods such as PCEN and M3S. The experimental results show that the proposed method is effective. © 2024, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Analysis of Recent Advancements in Visual Transformer Research for image Classification

SSRN

引用

SSRN 2023年

作者： Peng, Bin Bai, Jing Li, Wenjing Ma, Xiangyu Xi, Shuting School of Computer Science and Engineering North Minzu University Yinchuan750021 China National Ethnic Affairs Commission Image Graphics Intelligent Processing Laboratory Yinchuan750021 China

Transformer has become a widely used deep learning model in computer Vision applications, alongside Convolutional Neural Networks. Its ability to capture long-term dependencies through self-attention mechanism has made it a popular choice in image classification tasks. In this paper, we offer a holistic review of the current state of Visual Transformer research in the context of image classification. First, an overview of the fundamental architecture and principles of Transformer is provided. Then, a comprehensive review of recent advancements in visual transformers for traditional image classification is conducted. Specifically, we focus on three key aspects of research: computational efficiency, performance improvement, and training optimization. Additionally, the domain-specific image classification applications of Transformer are summarized and analyzed. Ultimately, we highlight the achievements and challenges encountered in visual transformer research for image classification, and provide valuable insights for future research. The comprehensive review and further analysis of the recent advancements and their potential impact on the field contribute to the ongoing progress in visual transformer research and lays the groundwork for further developments in image classification. © 2023, The Authors. All rights reserved.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

M3YOLOv5: Feature enhanced YOLOv5 model for mandibular fracture detection

引用

computers in Biology and Medicine 2024年 173卷 108291-108291页

作者： Zhou, Tao Wang, Hongwei Du, Yuhu Liu, Fengzhen Guo, Yujie Lu, Huiling School of Computer Science and Engineering North Minzu University Yinchuan750021 China School of Medical Information and Engineering Ningxia Medical University Yinchuan750004 China Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission North Minzu University Yinchuan750021 China

Background: It is very important to detect mandibular fracture region. However, the size of mandibular fracture region is different due to different anatomical positions, different sites and different degrees of force. It is difficult to locate and recognize fracture region accurately. Methods: To solve these problems, M3YOLOv5 model is proposed in this paper. Three feature enhancement strategies are designed, which improve the ability of model to locate and recognize mandibular fracture region. Firstly, Global-Local Feature Extraction Module (GLFEM) is designed. By effectively combining Convolutional Neural Network (CNN) and Transformer, the problem of insufficient global information extraction ability of CNN is complemented, and the positioning ability of the model to the fracture region is improved. Secondly, in order to improve the interaction ability of context information, Deep-Shallow Feature Interaction Module (DSFIM) is designed. In this module, the spatial information in the shallow feature layer is embedded to the deep feature layer by the spatial attention mechanism, and the semantic information in the deep feature layer is embedded to the shallow feature layer by the channel attention mechanism. The fracture region recognition ability of the model is improved. Finally, Multi-scale Multi receptive-field Feature Mixing Module (MMFMM) is designed. Deep separate convolution chains are used in this modal, which is composed by multiple layers of different scales and different dilation coefficients. This method provides richer receptive field for the model, and the ability to detect fracture region of different scales is improved. Results: The precision rate, mAP value, recall rate and F1 value of M3YOLOv5 model on mandibular fracture CT data set are 97.18%, 96.86%, 94.42% and 95.58% respectively. The experimental results show that there is better performance about M3YOLOv5 model than the mainstream detection models. Conclusion: The M3YOLOv5 model can effectiv

关键词： Fracture

来源：评论

学校读者我要写书评

暂无评论

Vehicular Perception Based on Inertial Sensing: A Structured Mapping of Approaches and Methods

引用

SN computer Science 2020年第5期1卷 1-24页

作者： Menegazzo, Jeferson von Wangenheim, Aldo Graduate Program in Computer Science (PPGCC) Image Processing and Computer Graphics Lab (LAPIX) Department of Informatics and Statistics (INE) Brazilian Institute for Digital Convergence (INCoD) Federal University of Santa Catarina (UFSC) Santa Catarina Florianópolis Brazil

In this paper, we present a structured literature mapping of the state-of-the-art of vehicular perception methods and approaches using inertial sensors. An in-depth investigation and classification were performed employing the results of a systematic literature review. The analysis focused on identifying methods that capture signals provided by inertial sensors such as accelerometers and gyroscopes to recognize transient or persistent events associated with the vehicle’s movement. We classified these events into vehicular exteroception, associated with potholes, cracks, speed bumps, pavement type, conservation state;and vehicular proprioception, associated with lane change, braking, skidding, aquaplaning, turning right or left. Through the comprehensive study of publications in a 7-year time window, in addition to the methods, we have also identified their dependency factors, hardware platforms and applications. © 2020, Springer Nature Singapore Pte Ltd.

关键词： Driver behaviour Driving style Inertial sensors Intelligent transport systems Road conditions Road surface anomaly

来源：评论

学校读者我要写书评

暂无评论

Pagml: Precise Alignment Guided Metric Learning for Sketch-Based 3d Shape Retrieval

SSRN

引用

SSRN 2023年

作者： Bai, Shaojin Bai, Jing Xu, Hao Tuo, Jiwen Liu, Min School of Computer Science and Engineering North Minzu University Yinchuan750021 China National Ethnic Affairs Commission Image graphics intelligent processing laboratory Yinchuan750021 China School of Mechanical Engineering Purdue University West Lafayette47907 United States

Sketch-based 3D shape retrieval has always been a hot research topic in the computer vision community. The main challenge is to alleviate the cross-modality discrepancies such that the retrieval accuracy can be improved. In this paper, we propose a novel Precise Alignment Guided Metric Learning (PAGML) method based on master-auxiliary cross-modality retrieval framework. An auxiliary learning network is developed to indirectly guide the master learning model to extract features of rich semantic information, so as to achieve a semantic alignment between the cross-modality data. Furthermore, considering that the unbalanced data distributions led to the poor uniformity in the common embedding space, a loss function dedicated for the imbalanced cross-modality data is designed to achieve a rigid alignment between sketches and 3D shapes of the same category by pulling their rich semantic representations to the rigid center of the category. As a result, a more precise alignment between the cross-modality embedding features of same category is approached gradually, which further alleviates the cross-modality discrepancies and improves the cross-modality retrieval accuracies. Extensive experiments on two public benchmark datasets demonstrate that the proposed PAGML surpasses the state-of-the-art methods in retrieval accuracy and has excellent generalization abilities to unseen classes. © 2023, The Authors. All rights reserved.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Teeth Yolact: A Model Fusing Deep-Shallow, Local-Global And Edge Features

SSRN

引用

SSRN 2024年

作者： Zhou, Tao Wang, Yaxing Chai, Wenwen Pan, Yunfeng Zhang, Zhe Lu, Huiling Xia, Yong School of Computer Science and Engineering North Minzu University Yinchuan750021 China School of Medical Information and Engineering Ningxia Medical University Yin Chuan750004 China Key Laboratory of Image and Graphics Intelligent Processing State Ethnic Affairs Commission North Minzu University Yinchuan750021 China School of Computer Northwestern Polytechnical University Xi’an710072 China

Background and ObjectiveThe instance segmentation of impacted tooth in the oral panoramic X-ray images is research hot. However, impacted tooth in panoramic X-Ray images lead to teeth deformities, low contrast between the tooth and periodontal tissue. In this paper, a Teeth YOLACT instance segmentation model is *** main contributions of this paper are as follows: Firstly, a Multi-scale Res-Transformer Module (MRTM) is designed, in the module, the depth-wise separable convolution with different perceptive fields are used to improve the model sensitivity to the lesion size, the Vision Transformer are used to improve the model perception ability about global features. Secondly, the Context Interaction-awareness Module (CIaM) is designed to fuse deep and shallow features. The shallow spatial features are guided by the deep semantic features. Then the shallow spatial features are embedded into the deep semantic features, and the cross-weighted attention mechanism is used to efficiently aggregate the deep and shallow features, richer context information are obtained. Thirdly, the Edge-preserving perception Module(E2PM) is designed to enhance the teeth edge features. The first order differential operator is used to get the tooth edge weight, perception ability of tooth edge features is improved. The shallow spatial feature is fused by linear mapping, weight concatenation and matrix multiplication operations to preserve the tooth edge information. Finally, comparison experiments and ablation experiments are conducted on the oral panoramic X-ray image *** results show that the APdet, APseg, ARdet, ARseg, mAPdet and mAPseg indicators of the proposed model are 89.9%, 91.9%, 77.4%, 77.6%, 72.8% and 73.5%, *** are some positive significances about the dental computer-aided diagnosis based on panoramic oral X-ray images. © 2024, The Authors. All rights reserved.

关键词： computer aided diagnosis

来源：评论

学校读者我要写书评

暂无评论

Research Progress of Encoder-Decoder Network in image Fusion

SSRN

引用

SSRN 2023年

作者： Zhou, Tao Zhang, Xiangxiang Cheng, Qianru Li, Qi Lu, Huiling School of Computer Science and Engineering North Minzu University Yinchuan750021 China Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission North Minzu University Yinchuan750021 China School of Science Ningxia Medical University Yinchuan750004 China

A B S T R A C Timage fusion is a research hotspot in the field of computer vision, which obtains complementary information from the different modal images and then presents it in a comprehensive, highly information-cohesive fused image. Encoder-decoder networks are an important technical method in the field of image fusion. This paper reviewed the research status and development of the Encoder-decoder networks in the field of image fusion from the following aspects: Firstly, the18 image fusion-related datasets and the 22 open-source codes of deep learning-based image fusion are summarized. Secondly, the model structure of encoder-decoder networks and the basic principles for image fusion based on the encoder-decoder network are described. Thirdly, encoder-decoder networks fusion models are summarized into 4 aspects: Convolutional Auto-encoder network(CAE Net), Sparse Auto-encoder network(SAE Net), U-shape Encoder-decoder network(U-net) and Variational Auto-encoder network(VAE Net),in U-net, there are 5 types: U-net incorporating residual connection mechanism (Res U-net), U-net incorporating dense connection mechanism (Dens U-net), U-net incorporating multiscale connection mechanism (MS U-net), U-net incorporating attention mechanism (Att U-net) and U-shaped Encoder-decoder networks(U-net) combined with Generative Adversarial Network(GAN U-net). Fourthly, the existing image fusion evaluation metrics are summarized from both subjective and objective aspects. Fifthly, the application of Encoder-decoder network in 5 kinds of image fusion, such as multi-exposure image fusion, multi-focus image fusion, infrared and visible image fusion, medical image fusion and remote sensing image fusion are summarized. Finally, there is a discussion of the main challenges faced by encoder-decoder networks in the image fusion domain and an outlook on future directions. This paper systematically analyzed the application of encoder-decoder networks in the image fusion areas, which has posi

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Named Entity Recognition in Chinese Judicial Domain Based on Self-attention mechanism and IDCNN 8

Named Entity Recognition in Chinese Judicial Domain Based on...

引用

8th International Conference on Digital Home, ICDH 2020

作者： Huang, Wenming Zhang, Juan Xiao, Yannan Han, Zheng Deng, Zhenrong Guilin University of Electronic Technology School of Computer Science and Information Security Guilin China Key Laboratory of Intelligent Processing of Computer Image and Graphics Guilin China Guilin Bank Guilin China

ISBN: (纸本)9781728192345

Chinese named entity recognition (CNER) in the judicial domain is an important basic task for intelligent analysis and processing of massive documents. This domain entity has more complicated structure than the common named entity, and its entity category is more abundant. However, the general method can not solve the problem of domain specific identification. In this paper, we combine self-attention mechanism and iteration dilated convolution neural network (IDCNN) for CNER in judicial domain. The bidirectional gate recurrent unit (BiGRU) model is used to automatically learn the context semantic information of the text and solve the long-distance dependence of the sequence. The model introduce the IDCNN to extract the key features of context semantic information, and capture finer-grained semantic information in underlying texts. The self-attention mechanism is used to analyze the relationship between characters, and the problem of long sequence semantic dilution is effectively solved by means of dynamic weight, and the optimal tag sequence is calculated by integrating conditional random fields (CRF), which further improves the recognition ability of the model. Finally, by analyzing the characteristics of legal documents, the new data set is annotated and the fine-grained named entity recognition is realized. The experimental results on our corpus show that the proposed method can effectively identify the entities in legal documents, and improve performance in the judicial field. © 2020 IEEE.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：