检索结果-内蒙古大学图书馆

10th international Work-conference on Bioinformatics and Biomedical Engineering (IWBBIO)

作者： Moreno-Barea, Francisco J. Mesa, Hector Ribelles, Nuria Alba, Emilio Jerez, Jose M. Univ Malaga Escuela Tecn Super Ingn Informat Dept Lenguajes & Ciencias Computac Malaga Spain Hosp Univ Reg & Virgen de la Victoria Unidad Gest Clin Interctr Oncol Malaga Spain

ISBN: (纸本)9783031349522;9783031349539

Healthcare systems currently store a large amount of clinical data, mostly unstructured textual information, such as electronic health records (EHRs). Manually extracting valuable information from these documents is costly for healthcare professionals. For example, when a patient first arrives at an oncology clinical analysis unit, clinical staff must extract information about the type of neoplasm in order to assign the appropriate clinical specialist. Automating this task is equivalent to text classification in natural language processing (NLP). In this study, we have attempted to extract the neoplasm type by processing Spanish clinical documents. A private corpus of 23, 704 real clinical cases has been processed to extract the three most common types of neoplasms in the Spanish territory: breast, lung and colorectal neoplasms. We have developed methodologies based on state-of-the-art text classification task, strategies based on machine learning and bag-of-words, based on embedding models in a supervised task, and based on bidirectional recurrent neural networks with convolutional layers (C-BiRNN). The results obtained show that the application of NLP methods is extremely helpful in performing the task of neoplasm type extraction. In particular, the 2-BiGRU model with convolutional layer and pre-trained fastText embedding obtained the best performance, with a macro-average, more representative than the micro-average due to the unbalanced data, of 0.981 for precision, 0.984 for recall and 0.982 for F1-score.

关键词： Text Classification Natural Language processing Electronic Health Records Neoplasm cancer Spanish

来源：评论

学校读者我要写书评

暂无评论

TabMentor: Detect Errors on Tabular data with Noisy Labels 9th

TabMentor: Detect Errors on Tabular Data with Noisy Labels

引用

19th international conference on Advanced data Mining and Applications, ADMA 2023

作者： Zhang, Yaru Qin, Jianbin Wang, Yaoshu Ali, Muhammad Asif Ji, Yan Mao, Rui Shenzhen Institute of Computing Sciences Shenzhen University Shenzhen China King Abdullah University of Science and Technology Thuwal Saudi Arabia

ISBN: (纸本)9783031466700

Existing supervised methods for error detection require access to clean labels in order to train the classification models. This is difficult to achieve in practical scenarios. While the majority of the error detection algorithms ignore the effect of noisy labels, in this paper, we design effective techniques for error detection when both data and labels contain noise. Nevertheless, we present TabMentor, a novel deep-learning model for error detection on tabular data with noisy training labels. TabMentor introduces a deep model for the prediction, i.e., Tabclassifier that suggests the most salient features for the decision step, enabling efficient learning. For feature extraction, it uses existing error detection algorithms, along with some raw features from the datasets. To reduce the negative effect of noisy training labels on the model, TabMentor uses another deep model, i.e., Teachernet, to supervise the training of Tabclassifier. During the training process, both Teachernet and Tabclassifier dynamically learn curriculum from data, allowing Tabclassifier to focus more on clean labeled samples. Performance evaluation using five different data sets shows that the TabMentor excels over the best baseline error detection system by 0.05 to 0.11 in terms of F1 scores. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

关键词： Error detection

来源：评论

学校读者我要写书评

暂无评论

Classification and action rules in identification and self-care assessment problems

引用

TECHNOLOGY AND HEALTH CARE 2022年第1期30卷 257-269页

作者： Zdrodowska, Malgorzata Dardzinska-Glebocka, Agnieszka Bialystok Tech Univ Fac Mech Engn Inst Biomed Engn Ul Wiejska 45C PL-15351 Bialystok Poland Bialystok Tech Univ Fac Mech Engn Inst Engn Mech Bialystok Poland

BACKGROUND: Disability, especially in children, is a very important and current problem. Lack of proper diagnosis and care increases the difficulty for children to adapt to disabilities. Disabled children have many problems with basic activities of daily living. Therefore, it is very important to support diagnosticians and physiotherapists in recognizing self-care problems in children. OBJECTIVE: The aim of this paper is to extract classification and action rules, useful for those who work with children with disabilities. METHODS: First, features and their impact on the accuracy of classification are determined. Then, two models are built: one with all features and one with selected ones. For these models the classification rules are extracted. Finally, action rules are mined and the next step in treatment process is predicted. RESULTS: Seventeen features with the greatest impact on classifying a child into a particular group of self-care problems were identified. Based on the implemented algorithms, decision and action rules were obtained. CONCLUSIONS: The obtained model, selected attributes and extracted classification and action rules can support the work of therapists and direct their work to those areas of disability where even a minimal reduction of features would be of great benefit to the children.

关键词： Disability ICF-CY self-care problem classification rules action rules feature selection classification data mining

来源：评论

学校读者我要写书评

暂无评论

Exploration of Applying Privacy Protection algorithms in Financial Fraud Detection: A Comparative Study of SecureBoost and Neural Networks

Exploration of Applying Privacy Protection Algorithms in Fin...

引用

Intelligent Systems and Computational Networks (ICISCN), international conference on

作者： Fei Wang Xue Liu School of Management Shanghai University China Basic Course Department Wuhan Donghu University Wuhan China

ISBN: (数字)9798331529246

ISBN: (纸本)9798331529253

This study explores the significant impact of corporate financial information disclosure on investor decision-making and economic policy-making. Financial fraud may lead to information distortion and disrupt market order, therefore the identification of financial fraud has always been a research focus. Previous studies have mainly relied on financial and non-financial data disclosed by enterprises, while Internet information is more indicative in identifying financial fraud. However, using Internet data will face copyright problems, and crawler technology is not the optimal solution. Information disclosure and transaction costs also limit its economic feasibility. To address these issues, this article adopts privacy preserving machine learning technology, which avoids legal, technical, and economic barriers by generating model parameters instead of using raw data. Based on 16112 samples from 2012 to 2020, this paper collects financial, non-financial and Internet information, and constructs three models: Model 1 is only based on financial and non-financial data, Model 2 adds Internet information on this basis, and Model 3 combines two privacy protection algorithms - SecureBoost and vertical neural network. The experimental results show that Model 2 improves accuracy by 7% to 10% compared to Model 1, while Model 3 further optimizes model performance while ensuring data privacy. This paper theoretically and empirically verifies the necessity of introducing Internet information, and the application potential of privacy protection machine learning technology in financial fraud detection.

关键词： data privacy Analytical models Privacy Machine learning algorithms Neural networks Finance data models Internet Fraud Protection

来源：评论

学校读者我要写书评

暂无评论

Application of Nonstationary Time Series Prediction to Shanghai Stock Index Based on SVM 22

Application of Nonstationary Time Series Prediction to Shang...

引用

3rd Asia-Pacific conference on Image processing, Electronics and Computers, IPEC 2022

作者： Yang, Chun Ou, Kaiman Hong, Shaoyong School of Accounting Guangzhou Huashang College Guangzhou China School of Data Science Guangzhou Huashang College Guangzhou China

ISBN: (纸本)9781450395786

With the development of computer software and hardware system, machine learning methods are more and more used in various industries of social development. In the aspect of stock index prediction, the current prediction method has gradually changed from the traditional statistical analysis method to the artificial intelligence analysis method. Based on the original sample data, this paper uses support vector machine regression (SVR) model to predict the opening price of Shanghai stock index. The parameters of SVR model are optimized and debugged by grid search method (grid), particle swarm optimization (PSO) and genetic algorithm (GA). The analysis results show that the three types of support vector machine prediction models based on the original sample data can fully reflect the time-varying law of stock index and have high prediction accuracy. Among them, genetic algorithm support vector machine regression (GA-SVR) model shows that the minimum root mean square error (RMSE) is 14.730 and the minimum average absolute percentage error (MAPE) is 0.375%. GA-SVR model has good prediction effect and has certain significance for the prediction of stock price. © 2022 ACM.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

LLM-PBE: Assessing data Privacy in Large Language models 50th

LLM-PBE: Assessing Data Privacy in Large Language Models

引用

50th international conference on Very Large data Bases, VLDB 2024

作者： Li, Qinbin Hong, Junyuan Xie, Chulin Tan, Jeffrey Xin, Rachel Hou, Junyi Yin, Xavier Wang, Zhun Hendrycks, Dan Wang, Zhangyang Li, Bo He, Bingsheng Song, Dawn University of California Berkeley United States University of Texas at Austin United States University of Illinois Urbana-Champaign United States National University of Singapore Singapore Center for AI Safety United States University of Chicago United States

Large Language models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of un intentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://***/, providing an open platform for academic and practical advancements in LLM privacy assessment. © 2024, VLDB Endowment. All rights reserved.

关键词： Information leakage

来源：评论

学校读者我要写书评

暂无评论

Research on Visual and Text Information Fusion Algorithm in Multimodal Sentiment Analysis

Research on Visual and Text Information Fusion Algorithm in ...

引用

Computers, Information processing and Advanced Education (CIPAE), 2020 international conference on

作者： Lun Xinyu Chen Ling College of Education Guangdong Baiyun University Guangzhou China International College Macau University of Science and Technology Macau China

ISBN: (数字)9798331527662

ISBN: (纸本)9798331527679

This paper aims to explore a new visual and text information fusion algorithm, which can effectively improve the accuracy and efficiency of sentiment analysis by combining the advantages of the trusted fine-grained alignment model and the Faster R-CNN algorithm. First, this paper proposes a visual object detection mechanism based on Faster R-CNN algorithm, which can accurately identify and locate key emotion expression elements in images. Then, the alignment model is used to associate the visual object with the corresponding text description, and the cross-modal information fusion is realized. This fusion not only considers the explicit emotional cues of visual information, but also digs deeply into the implicit emotional tendencies in the text, thus providing more comprehensive and detailed emotional analysis results. In order to verify the effectiveness of the proposed algorithm, a series of simulation experiments are designed and tested on several public data sets. The experimental results show that compared with the traditional single-modal analysis method, the fusion algorithm proposed in this paper significantly improves the performance of emotion classification tasks, especially when dealing with complex emotion expression and ambiguous text, showing stronger robustness and adaptability. The research results of this paper not only provide a new technical path in the field of multi-modal sentiment analysis, but also provide more reliable technical support for related application scenarios such as product review analysis and social media public opinion monitoring.

关键词： Visualization Sentiment analysis Analytical models Emotion recognition Accuracy Social networking (online) Object detection Robustness data models Classification algorithms

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Overview on data Augmentation Techniques for Medical Images

A Comprehensive Overview on Data Augmentation Techniques for...

引用

Electronics and Sustainable Communication Systems (ICESC), 2020 international conference on

作者： Swarajya Madhuri Rayavarapu Tammineni Shanmukha Prasanthi Sasibhushana Rao Gottapu Aruna Singam Department of Electronics and Communication Andhra University college of Engineering India

ISBN: (数字)9798350379945

ISBN: (纸本)9798350379952

A significant number of recent advancements in Deep Learning have significantly benefited from training sets that are both larger and more diversified. Nevertheless, the collection of huge datasets for medical imaging continues to be a challenge due to issues around privacy and the expenses associated with labelling. Through the use of data augmentation, it is feasible to significantly increase the quantity and variety of data that is accessible for training purposes without actually collecting additional samples. data augmentation techniques span from straightforward changes like cropping, padding, and flipping to more complicated generative models. These transformations are surprisingly powerful despite their apparent simplicity. Different data augmentation procedures are likely to function differently depending on the nature of the input and the visual task that is being performed. As a result of this, it is probable that medical imaging calls for particular augmentation algorithms that are capable of producing believable data samples and enabling the successful regularization of deep neural networks. This paper reviews different data augmentation techniques.

关键词： Training Deep learning data privacy Visualization Machine learning algorithms Reviews data augmentation data models Labeling Biomedical imaging

来源：评论

学校读者我要写书评

暂无评论

A Co-Training Approach for Spatial data Disaggregation 22

A Co-Training Approach for Spatial Data Disaggregation

引用

30th ACM SIGSPATIAL international conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS)

作者： Monteiro, Joao Martins, Bruno Costa, Miguel Pires, Joao M. Univ Lisbon INESC ID IST Lisbon Portugal Vodafone Lisbon Portugal Univ Nova Lisboa NOVA LINCS FCT Caparica Portugal

ISBN: (纸本)9781450395298

Socio-demographic information is usually only accessible at relatively coarse spatial resolutions. However, its availability at thinner granularities is of substantial interest for several stakeholders, since it enhances the formulation of informed hypotheses on the distribution of population indicators. Spatial disaggregation methods aim to compute these fine-grained estimates, often using regression algorithms that employ ancillary data to re-distribute the aggregated information. However, since disaggregation tasks are ill-posed, and given that examples of disaggregated data at the target geospatial resolution are seldom available, model training is particularly challenging. We propose to address this problem through a self-supervision framework that iteratively refines initial estimates from seminal disaggregation heuristics. Specifically, we propose to cotrain two different models, using the results from one model to train/refine the other. By doing so, we are able to explore complementary views from the data. We assessed the use of co-training with a fast regressor based on random forests that takes individual raster cells as input, together with a more expressive model, based on a fully-convolutional neural network, that takes raster patches as input. We also compared co-training against the use of self-training with a single model. In experiments involving the disaggregation of a socio-demographic variable collected for Continental Portugal, the results show that our co-training approach outperforms alternative disaggregation approaches, including methods based on self-training or co-training with two similar fully-convolutional models. Co-training is effective at exploring the characteristics of both regression algorithms, leading to a consistent improvement in different types of error metrics.

关键词： geospatial data disaggregation dasymetric disaggregation self-supervised learning co-training encoder-decoder neural networks convolutional neural networks deep learning

来源：评论

学校读者我要写书评

暂无评论

AI and Big data for the management of COVID-19 1

AI and Big Data for the management of COVID-19

引用

1st international conference on Renewable Solutions for Ecosystems: Towards a Sustainable Energy Transition, ICRSEtoSET 2023

作者： El Kayaly, Dina Elhady, Sherifa Hazem, Nahla Ismail, Tawfik Fahim, Irene Samy New Giza University International Business Cairo Egypt Nile University Sesc Research Center Sheikh Zayed Giza Egypt Nile University Management of Technology Sheikh Zayed Giza Egypt Giza Egypt

ISBN: (纸本)9798350346336

In March 2020, World Health Organization (WHO) recognized COVID-19 as a pandemic and urged governments to exert maximum efforts to prevent its spreading through political decisions together with public awareness campaigns positively impacting personal behaviors. Moreover, the WHO recommends collecting facts and data from reliable sources to help accurately determine the risks, accordingly governments take reasonable precautions. There are several ways to fight against corona such as accelerating research for the doctors, scientists and organizations working to find a vaccine or a medicine to defeat the COVID-19 virus, cleaning and sterilizing facilities, ensure health and productivity of people while changing their workplace, provide supercomputers to fight the virus. Artificial Intelligence (AI) possesses remarkable potency in its capacity to assist in combating the COVID-19 pandemic. This is achieved through a diverse range of methodologies such as Machine Learning, Natural Language processing, and Computer Vision applications. By instructing computers to effectively employ models based on extensive data sets, the objective of pattern recognition, explication, and prognostication is pursued [1]. These techniques will generate knowledge that can be useful in diagnosing, predicting, and treating COVID-19. [2]-[3] AI can also help in detecting patterns that help us to manage COVID 19 socio-economic impacts [4]. Since the outbreak of the pandemic, there has been a scramble to use AI. This article aims to overview the possible applications of AI and Big data in facing COVID 19 pandemic. Four possible applications are identified, namely effective alert helping in monitoring the outbreaks instantaneously;diagnostic cases of COVID 19 and tailor medication;facilitating the implementation of Public Health interventions and resource optimizations;and Building cities with smart healthcare services. © 2023 IEEE.

关键词： COVID-19

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：