Healthcare systems currently store a large amount of clinical data, mostly unstructured textual information, such as electronic health records (EHRs). Manually extracting valuable information from these documents is c...
详细信息
ISBN:
(纸本)9783031349522;9783031349539
Healthcare systems currently store a large amount of clinical data, mostly unstructured textual information, such as electronic health records (EHRs). Manually extracting valuable information from these documents is costly for healthcare professionals. For example, when a patient first arrives at an oncology clinical analysis unit, clinical staff must extract information about the type of neoplasm in order to assign the appropriate clinical specialist. Automating this task is equivalent to text classification in natural language processing (NLP). In this study, we have attempted to extract the neoplasm type by processing Spanish clinical documents. A private corpus of 23, 704 real clinical cases has been processed to extract the three most common types of neoplasms in the Spanish territory: breast, lung and colorectal neoplasms. We have developed methodologies based on state-of-the-art text classification task, strategies based on machine learning and bag-of-words, based on embedding models in a supervised task, and based on bidirectional recurrent neural networks with convolutional layers (C-BiRNN). The results obtained show that the application of NLP methods is extremely helpful in performing the task of neoplasm type extraction. In particular, the 2-BiGRU model with convolutional layer and pre-trained fastText embedding obtained the best performance, with a macro-average, more representative than the micro-average due to the unbalanced data, of 0.981 for precision, 0.984 for recall and 0.982 for F1-score.
Existing supervised methods for error detection require access to clean labels in order to train the classification models. This is difficult to achieve in practical scenarios. While the majority of the error detectio...
详细信息
BACKGROUND: Disability, especially in children, is a very important and current problem. Lack of proper diagnosis and care increases the difficulty for children to adapt to disabilities. Disabled children have many pr...
详细信息
BACKGROUND: Disability, especially in children, is a very important and current problem. Lack of proper diagnosis and care increases the difficulty for children to adapt to disabilities. Disabled children have many problems with basic activities of daily living. Therefore, it is very important to support diagnosticians and physiotherapists in recognizing self-care problems in children. OBJECTIVE: The aim of this paper is to extract classification and action rules, useful for those who work with children with disabilities. METHODS: First, features and their impact on the accuracy of classification are determined. Then, two models are built: one with all features and one with selected ones. For these models the classification rules are extracted. Finally, action rules are mined and the next step in treatment process is predicted. RESULTS: Seventeen features with the greatest impact on classifying a child into a particular group of self-care problems were identified. Based on the implemented algorithms, decision and action rules were obtained. CONCLUSIONS: The obtained model, selected attributes and extracted classification and action rules can support the work of therapists and direct their work to those areas of disability where even a minimal reduction of features would be of great benefit to the children.
This study explores the significant impact of corporate financial information disclosure on investor decision-making and economic policy-making. Financial fraud may lead to information distortion and disrupt market or...
详细信息
ISBN:
(数字)9798331529246
ISBN:
(纸本)9798331529253
This study explores the significant impact of corporate financial information disclosure on investor decision-making and economic policy-making. Financial fraud may lead to information distortion and disrupt market order, therefore the identification of financial fraud has always been a research focus. Previous studies have mainly relied on financial and non-financial data disclosed by enterprises, while Internet information is more indicative in identifying financial fraud. However, using Internet data will face copyright problems, and crawler technology is not the optimal solution. Information disclosure and transaction costs also limit its economic feasibility. To address these issues, this article adopts privacy preserving machine learning technology, which avoids legal, technical, and economic barriers by generating model parameters instead of using raw data. Based on 16112 samples from 2012 to 2020, this paper collects financial, non-financial and Internet information, and constructs three models: Model 1 is only based on financial and non-financial data, Model 2 adds Internet information on this basis, and Model 3 combines two privacy protection algorithms - SecureBoost and vertical neural network. The experimental results show that Model 2 improves accuracy by 7% to 10% compared to Model 1, while Model 3 further optimizes model performance while ensuring data privacy. This paper theoretically and empirically verifies the necessity of introducing Internet information, and the application potential of privacy protection machine learning technology in financial fraud detection.
With the development of computer software and hardware system, machine learning methods are more and more used in various industries of social development. In the aspect of stock index prediction, the current predicti...
详细信息
Large Language models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex ...
详细信息
This paper aims to explore a new visual and text information fusion algorithm, which can effectively improve the accuracy and efficiency of sentiment analysis by combining the advantages of the trusted fine-grained al...
详细信息
ISBN:
(数字)9798331527662
ISBN:
(纸本)9798331527679
This paper aims to explore a new visual and text information fusion algorithm, which can effectively improve the accuracy and efficiency of sentiment analysis by combining the advantages of the trusted fine-grained alignment model and the Faster R-CNN algorithm. First, this paper proposes a visual object detection mechanism based on Faster R-CNN algorithm, which can accurately identify and locate key emotion expression elements in images. Then, the alignment model is used to associate the visual object with the corresponding text description, and the cross-modal information fusion is realized. This fusion not only considers the explicit emotional cues of visual information, but also digs deeply into the implicit emotional tendencies in the text, thus providing more comprehensive and detailed emotional analysis results. In order to verify the effectiveness of the proposed algorithm, a series of simulation experiments are designed and tested on several public data sets. The experimental results show that compared with the traditional single-modal analysis method, the fusion algorithm proposed in this paper significantly improves the performance of emotion classification tasks, especially when dealing with complex emotion expression and ambiguous text, showing stronger robustness and adaptability. The research results of this paper not only provide a new technical path in the field of multi-modal sentiment analysis, but also provide more reliable technical support for related application scenarios such as product review analysis and social media public opinion monitoring.
A significant number of recent advancements in Deep Learning have significantly benefited from training sets that are both larger and more diversified. Nevertheless, the collection of huge datasets for medical imaging...
详细信息
ISBN:
(数字)9798350379945
ISBN:
(纸本)9798350379952
A significant number of recent advancements in Deep Learning have significantly benefited from training sets that are both larger and more diversified. Nevertheless, the collection of huge datasets for medical imaging continues to be a challenge due to issues around privacy and the expenses associated with labelling. Through the use of data augmentation, it is feasible to significantly increase the quantity and variety of data that is accessible for training purposes without actually collecting additional samples. data augmentation techniques span from straightforward changes like cropping, padding, and flipping to more complicated generative models. These transformations are surprisingly powerful despite their apparent simplicity. Different data augmentation procedures are likely to function differently depending on the nature of the input and the visual task that is being performed. As a result of this, it is probable that medical imaging calls for particular augmentation algorithms that are capable of producing believable data samples and enabling the successful regularization of deep neural networks. This paper reviews different data augmentation techniques.
Socio-demographic information is usually only accessible at relatively coarse spatial resolutions. However, its availability at thinner granularities is of substantial interest for several stakeholders, since it enhan...
详细信息
ISBN:
(纸本)9781450395298
Socio-demographic information is usually only accessible at relatively coarse spatial resolutions. However, its availability at thinner granularities is of substantial interest for several stakeholders, since it enhances the formulation of informed hypotheses on the distribution of population indicators. Spatial disaggregation methods aim to compute these fine-grained estimates, often using regression algorithms that employ ancillary data to re-distribute the aggregated information. However, since disaggregation tasks are ill-posed, and given that examples of disaggregated data at the target geospatial resolution are seldom available, model training is particularly challenging. We propose to address this problem through a self-supervision framework that iteratively refines initial estimates from seminal disaggregation heuristics. Specifically, we propose to cotrain two different models, using the results from one model to train/refine the other. By doing so, we are able to explore complementary views from the data. We assessed the use of co-training with a fast regressor based on random forests that takes individual raster cells as input, together with a more expressive model, based on a fully-convolutional neural network, that takes raster patches as input. We also compared co-training against the use of self-training with a single model. In experiments involving the disaggregation of a socio-demographic variable collected for Continental Portugal, the results show that our co-training approach outperforms alternative disaggregation approaches, including methods based on self-training or co-training with two similar fully-convolutional models. Co-training is effective at exploring the characteristics of both regression algorithms, leading to a consistent improvement in different types of error metrics.
In March 2020, World Health Organization (WHO) recognized COVID-19 as a pandemic and urged governments to exert maximum efforts to prevent its spreading through political decisions together with public awareness campa...
详细信息
暂无评论