Medical code prediction is an important method for unstructured electronic health records (EHRs) analysis. It is a disease classification method introduced by the Worldwide Health Organization (WHO) that can associate...
详细信息
Medical code prediction is an important method for unstructured electronic health records (EHRs) analysis. It is a disease classification method introduced by the Worldwide Health Organization (WHO) that can associate EMR with the corresponding medical code. The challenges of automatic medical code prediction mainly include the long clinical text and long-tailed label distribution. Prior studies have designed pre-trained language models (PLM) for medical code prediction. However, These methods either did not solve the problem of input text length limit, or they did not propose additional methods for better code representations to solve the problem of long-tailed label distribution. In this work, we propose a novel contrastive learning framework with a large language model (CL-LLM) for medical code prediction. CL-LLM exploits the large language model (LLM) in solving the long-tailed label distribution problem by designing a prompt method to generate synonymous code descriptions. In addition, CL-LLM uses contrastive learning to inject synonyms into code description encoder to enhance the model's few-shot label prediction. To solve the problem of long input text, we perform PLMClinicalBERT as a clinical text encoder and split pooling to segment long input text. We conducted experiments on the public MIMIC-III and MIMIC-IV datasets. The results on MIMIC III and MIMIC-IV datasets show that our model outperforms previous state-of-the-art methods for automatic icd coding. We also conducted ablation experiments to prove the importance of each component in CL-LLM. To further verify the performance on rare labels, we test our methods on the MIMIC-III RARE50 dataset and achieve predominant results.
The International Classification of Diseases (icd) is a widely used criterion for disease classification, health monitoring, and medical data analysis. Deep learning-based automated icdcoding has gained attention due...
详细信息
The International Classification of Diseases (icd) is a widely used criterion for disease classification, health monitoring, and medical data analysis. Deep learning-based automated icdcoding has gained attention due to the time-consuming and costly nature of manual coding. The main challenges of automated icdcoding include imbalanced label distribution, code hierarchy and noisy texts. Recent works have considered using code hierarchy or description for better label representation to solve the problem of imbalanced label distribution. However, these methods are still ineffective and redundant since they only interact with a constant label representation. In this work, we introduce a novel Hyperbolic Graph Convolutional Network with Contrastive Learning (HGCN-CL) to solve the above problems and the shortcomings of the previous methods. We adopt a Hyperbolic graph convolutional network on icdcoding to capture the hierarchical structure of codes, which can solve the problem of large distortions when embedding hierarchical structure with graph convolutional network. Besides, we introduce contrastive learning for automatic icd coding by injecting code features into text encoder to generate hierarchical-aware positive samples to solve the problem of interacting with constant code features. We conduct experiments on the public MIMIC-III and MIMIC-II datasets. The results on MIMIC III show that HGCN-CL outperforms previous state-of-art methods for automatic icd coding, which achieves a 2.7% and 3.6% improvement respectively compared to previous best results (Hypercore). We also provide ablation experiments and hierarchy visualization to verify the effectiveness of components in our model.
A common challenge encountered when using Deep Neural Network models for automatic icd coding is their potential inability to effectively handle unseen clinical texts, especially when these models are only trained on ...
详细信息
A common challenge encountered when using Deep Neural Network models for automatic icd coding is their potential inability to effectively handle unseen clinical texts, especially when these models are only trained on a limited number of examples. This is because these models rely solely on the patterns and relationships present in the training data, and may not be able to effectively incorporate additional knowledge about the relationships between medical entities. To address this issue, we introduce KG-MultiResCNN-KnowledgeGuidedMulti-filterResidualConvolutionalNeuralNetwork model, which combines training examples with external knowledge from the Wikidata Knowledge Graph (KG) in order to better capture the relationships between medical entities. The KG is a structured database that contains a wealth of information about various entities, including medical concepts and their relationships with one another. By incorporating this external knowledge into our model, we are able to improve its ability to predict icd codes for new clinical texts. In our experiments with the MIMIC-III dataset, we found that the KG-MultiResCNN model significantly outperformed the baseline approaches. This demonstrates the effectiveness of using external knowledge, in addition to training examples, to improve the performance of deep learning models for automatic icd coding.
The task of automaticicd (International Classification of Diseases) coding involves allocating appropriate icd codes to electronic health records. Due to the long-tailed distribution of icd codes, current methods per...
详细信息
The task of automaticicd (International Classification of Diseases) coding involves allocating appropriate icd codes to electronic health records. Due to the long-tailed distribution of icd codes, current methods perform poorly on rare diseases, also known as few-shot codes. Consequently, we resort to transfer learning as a solution to address the challenge of icdcoding with limited instances. In our paper, we examine the opportunities in few-shot icdcoding and propose a new solution, the evidence-representation-based meta-network (EPEN). Our model has two key innovations: (i) we design evidence representation for diseases based on the observation that the same disease can have different symptoms among individuals, and (ii) we construct a meta-network to memorize category knowledge from common diseases and apply it to rare diseases. Many experiments show that our EPEN solution performs better than the previous methods for both frequently occurring icd codes and infrequently occurring icd codes (few-shot codes). Furthermore, EPEN exhibits improved stability in performance, as evidenced by an improvement in both the mean and range of the F1-score.
The international classification of diseases is a standard in medical coding, and it is contains all information and description of diseases in heroical structure, and finding the International Classification of Disea...
详细信息
automatic International Classification of Diseases (icd) coding is a method of automatically classifying diseases through a computer program based on rules of etiology and clinical presentation, and representing them ...
详细信息
automatic International Classification of Diseases (icd) coding is a method of automatically classifying diseases through a computer program based on rules of etiology and clinical presentation, and representing them through codes, which are widely used to assist in medical reimbursement and reporting of patient health status. With the application of machine learning and deep learning, the accuracy of automatic icd coding methods has improved considerably. However, this has been accompanied by problems such as insufficient pre-training of text in the models and increased computational complexity along with improved prediction accuracy. In this work we propose an approach called TF-GCN to counter this problem. Firstly, a more accurate and concise feature representation is obtained by feature extraction of both clinical records and icd codes through the transformer-based model. Secondly, the node features, document features, and relationships between them in the obtained clinical records are input to the GCN for training. Next, a pseudo labeling attention mechanism is added to eliminate the noise generated in the feature extraction process. Finally, the features of the clinical records are compared with the features of the icd codes for similarity to obtain the classification results. This can not only reduce computational redundancy, but also obtain more accurate classification features. In the real-world MIMIC-III dataset, we compare the proposed algorithm with 11 automatic icd coding methods to validate the performance of TF-GCN. According to experimental findings, our suggested strategy outperforms the standard evaluation metrics Mif (0.589), MiAUC (0.989), and P@8 (0.758).
暂无评论