检索结果-内蒙古大学图书馆

Improving diagnostic accuracy using agent-based distributed data mining system

INFORMATICS FOR HEALTH & SOCIAL CARE 2013年第3期38卷 182-195页

作者： Sridhar, S. Anna Univ Dept Informat Sci & Technol Chennai 600025 Tamil Nadu India

The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

关键词： Expert systems distributed data mining knowledge integration

来源：评论

学校读者我要写书评

暂无评论

Evaluation Platform for DDM Algorithms With the Usage of Non-Uniform data Distribution Strategies

引用

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH 2022年第1期15卷 1-23页

作者： Markiewicz, Mikolaj Koperwas, Jakub Warsaw Univ Technol Warsaw Poland

Huge amounts of data are collected in numerous independent data storage facilities around the world. However, how the data is distributed between physical locations remains unspecified. Downloading all of the data for the purpose of processing is undesirable and sometimes even impossible. Various methods have been proposed for performing data mining tasks, but the main problem is the lack of an objective strategy for comparing them. The authors present current research on a novel evaluation platform for distributed data mining (DDM) algorithms. The proposed platform opens up a new field to evaluate algorithms in terms of the quality of the results, transfer used, and speed, but also for the use of a non-uniform data distribution among independent nodes during algorithm evaluation. This work introduces a `data partitioning strategy' term referring to a specific, not necessarily uniform data distribution. A brief evaluation for three clustering algorithms is also reported showing the usability and simplicity of identifying differences in processing with the use of the platform.

关键词： Algorithm Evaluation benchmarking Platform Classification Clustering data Partitioning Strategies distributed data mining distributed Processing

来源：评论

学校读者我要写书评

暂无评论

Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization 23

Communication-Efficient Decentralized Online Continuous DR-S...

引用

32nd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Zhang, Qixin Deng, Zengde Jian, Xiangru Chen, Zaiyi Hu, Haoyuan Yang, Yu City Univ Hong Kong Hong Kong Peoples R China Cainiao Network Hangzhou Peoples R China Univ Waterloo Waterloo ON Canada

ISBN: (纸本)9798400701245

Maximizing a monotone submodular function is a fundamental task in data mining, machine learning, economics, and statistics. In this paper, we present two communication-efficient decentralized online algorithms for the monotone continuous DR-submodular maximization problem, both of which reduce the number of perfunction gradient evaluations and per-round communication complexity from T-3/2 to 1. The first one, One-shot Decentralized MetaFrank-Wolfe (Mono-DMFW), achieves a ( 1 - 1/e)-regret bound of O(T-4/5). As far as we know, this is the first one-shot and projectionfree decentralized online algorithm for monotone continuous DRsubmodular maximization. Next, inspired by the non-oblivious boosting function [29], we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm, which attains a (1- 1/e)-regret of O (root T). To the best of our knowledge, this is the first result to obtain the optimal O (root T) against a ( 1- 1/e)-approximation with only one gradient inquiry for each local objective function per step. Finally, various experimental results confirm the effectiveness of the proposed methods.

关键词： distributed data mining online learning submodular maximization

来源：评论

学校读者我要写书评

暂无评论

Incentive-Compatible Privacy-preserving distributed data mining

Incentive-Compatible Privacy-preserving Distributed Data Min...

引用

IEEE 13th International Conference on data mining (ICDM)

作者： Kantarcioglu, Murat Univ Texas Dallas Dept Comp Sci Dallas TX 75230 USA

ISBN: (纸本)9780769551098

The quantity of data that is captured, collected, and stored by a wide variety of organizations is growing at an exponential rate. The potential for such data to support scientific discovery and optimization of existing systems is significant, but only if it can be integrated and analyzed in a meaningful way by a wide range of investigators. While many believe that data sharing is desirable, there are also privacy and security concerns, rooted in ethics and the law that often prevent many legitimate and noteworthy applications. In this talk, we will provide an overview on research regarding how to integrate and mine large amounts of privacy-sensitive distributed data without violating such constraints. Especially, we will discuss how to incentivize data sharing in privacy-preserving distributed data mining applications. This work will draw upon examples form the biomedical domain and discuss recent research on privacy-preserving mining of genomic databases.

关键词： distributed data mining incentives privacy Privacy data mining incentives data Sharing scientific discovery Distribute Speaking Ethics

来源：评论

学校读者我要写书评

暂无评论

A Game Theory based Repeated Rational Secret Sharing Scheme for Privacy Preserving distributed data mining

A Game Theory based Repeated Rational Secret Sharing Scheme ...

引用

10th International Conference on Security and Cryptography (SECRYPT)

作者： Nanavati, Nirali R. Jinwala, Devesh C. Sardar Vallabhbhai Natl Inst Technol Surat India

ISBN: (纸本)9789897581311

Collaborative data mining has become very useful today with the immense increase in the amount of data collected and the increase in competition. This in turn increases the need to preserve the participants' privacy. There have been a number of approaches proposed that use Secret Sharing for privacy preservation for Secure Multiparty Computation (SMC) in different setups and applications. The different multiparty scenarios may have parties that are semi- honest, rational or malicious. A number of approaches have been proposed for semi honest parties in this setup. The problem however is that in reality we have to deal with parties that act in their self- interest and are rational. These rational parties may try and attain maximum gain without disrupting the protocol. Also these parties if cautioned would correct themselves to have maximum individual gain in the future. Thus we propose a new practical game theoretic approach with three novel punishment policies with the primary advantage that it avoids the use of expensive techniques like homomorphic encryption. Our proposed approach is applicable to the secret sharing scheme among rational parties in distributed data mining. We have analysed theoretically the proposed novel punishment policies for this approach. We have also empirically evaluated and implemented our scheme using Java. We compare the punishment policies proposed in terms of the number of rounds required to attain the Nash equilibrium with eventually no bad rational nodes with different percentage of initial bad nodes.

关键词： Privacy Game Theory Secure Multiparty Computation Rational Secret Sharing distributed data mining

来源：评论

学校读者我要写书评

暂无评论

A distributed data mining Framework Accelerated with Graphics Processing Units

A Distributed Data Mining Framework Accelerated with Graphic...

引用

International Conference on Cloud Computing and Big data (CLOUDCOM-ASIA)

作者： Nam-Luc Tran Dugauthier, Quentin Skhiri, Sabri Euranova R&D Mont St Guibert Belgium

ISBN: (纸本)9781479928293

In the context of processing high volumes of data, the recent developments have led to numerous models and frameworks of distributed processing running on clusters of commodity hardware. On the other side, the Graphics Processing Unit (GPU) has seen much enthusiastic development as a device for general-purpose intensive parallel computation. In this paper we propose a framework which combines both approaches and evaluates the relevance of having nodes in a distributed processing cluster that make use of GPU units for further fine-grained parallel processing. We have engineered parallel and distributed versions of two data mining problems, the naive Bayes classifier and the k-means clustering algorithm, to run on the framework and have evaluated the performance gain. Finally, we also discuss the requirements and perspectives of integrating GPUs in a distributed processing cluster, introducing a fully distributed heterogeneous computing cluster.

关键词： GPU algorithm data mining distributed data mining distributed processing kmeans naive bayes processing distributed processing data mining Graphics Processing Unit GRAPPER PICK UP Frameworks Bayesian classifier

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous Federated Learning via Grouped Sequential-to-Parallel Training 1

引用

27th International Conference on database Systems for Advanced Applications (DASFAA)

作者： Zeng, Shenglai Li, Zonghang Yu, Hongfang He, Yihong Xu, Zenglin Niyato, Dusit Yu, Han Univ Elect Sci & Technol China Sch Informat & Commun Engn Chengdu Peoples R China Harbin Inst Technol Sch Comp Sci & Technol Shenzhen Peoples R China Nanyang Technol Univ Sch Comp Sci & Engn Singapore Singapore

ISBN: (数字)9783031001260

ISBN: (纸本)9783031001260;9783031001253

Federated learning (FL) is a rapidly growing privacy preserving collaborative machine learning paradigm. In practical FL applications, local data from each data silo reflect local usage patterns. Therefore, there exists heterogeneity of data distributions among data owners (a.k.a. FL clients). If not handled properly, this can lead to model performance degradation. This challenge has inspired the research field of heterogeneous federated learning, which currently remains open. In this paper, we propose a data heterogeneity-robust FL approach, FEDGSP, to address this challenge by leveraging on a novel concept of dynamic Sequential-to-Parallel (STP) collaborative training. FEDGSP assigns FL clients to homogeneous groups to minimize the overall distribution divergence among groups, and increases the degree of parallelism by reassigning more groups in each round. It is also incorporated with a novel Inter-Cluster Grouping (ICG) algorithm to assist in group assignment, which uses the centroid equivalence theorem to simplify the NP-hard grouping problem to make it solvable. Extensive experiments have been conducted on the non-i.i.d. FEMNIST dataset. The results show that FEDGSP improves the accuracy by 3.7% on average compared with seven state-of-the-art approaches, and reduces the training time and communication overhead by more than 90%.

关键词： Federated learning distributed data mining Heterogeneous data Clustering-based learning

来源：评论

学校读者我要写书评

暂无评论

Privacy-Accuracy Trade-Off in Differentially-Private distributed Classification: A Game Theoretical Approach

引用

IEEE TRANSACTIONS ON BIG data 2021年第4期7卷 770-783页

作者： Xu, Lei Jiang, Chunxiao Qian, Yi Li, Jianhua Zhao, Youjian Ren, Yong Beijing Inst Technol Sch Comp Sci & Technol Beijing 100081 Peoples R China Tsinghua Univ Tsinghua Space Ctr Beijing 100084 Peoples R China Univ Nebraska Dept Elect & Comp Engn Omaha NE 68182 USA Shanghai Jiao Tong Univ Coll Informat Secur Shanghai 201203 Peoples R China Tsinghua Univ Dept Comp Sci & Technol Beijing 100084 Peoples R China Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China

Nowadays the privacy issue arising in data mining applications has attracted much attention. In the context of distributed data mining, a major concern of the participant is that its privacy may be disclosed to other participants or a third party. To protect privacy, one can apply a differential privacy approach to perturb the data before sharing them with others, which generally causes a negative effect on the mining result. Thus there is a trade-off between privacy and the mining result. In this paper, we study a distributed classification scenario where a mediator builds a classifier based on the perturbed query results returned by a number of users. We propose a game theoretical approach to analyze how users choose their privacy budgets. Specifically, interactions among users are modeled as a game in satisfaction form. And an algorithm is proposed for users to learn the satisfaction equilibrium (SE) of the game. Experimental results demonstrate that, when the differences among users' expectations are not significant, the proposed learning algorithm can converge to an SE, at which every user achieves a balance between the accuracy of the classifier and the preserved privacy.

关键词： Games data privacy Privacy distributed databases Protocols Game theory distributed data mining differential privacy game theory satisfaction equilibrium equilibrium learning

来源：评论

学校读者我要写书评

暂无评论

distributed data mining patterns and services: an architecture and experiments

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2012年第15期24卷 1751-1774页

作者： Cesario, Eugenio Talia, Domenico ICAR CNR I-87036 Arcavacata Di Rende CS Italy Univ Calabria DEIS I-87036 Arcavacata Di Rende CS Italy

distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high-performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high-level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta-learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k-means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service-oriented framework. An extensive evaluation of its performance was provided. Copyright (c) 2011 John Wiley & Sons, Ltd.

关键词： grid computing distributed data mining OGSA WSRF

来源：评论

学校读者我要写书评

暂无评论

A Multi-Perspective distributed mining Framework for Scalable Search Spell Correction 33

A Multi-Perspective Distributed Mining Framework for Scalabl...

引用

IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)

作者： Li, Yutong Bommaganti, Hari Yadava, Himanshu Apple Inc Cupertino CA 95014 USA

ISBN: (纸本)9781665408981

Acquiring high-quality misspelling data at large scale to train quality search speller models is key but challenging. Synthetic data generation approaches as a major focus in literature are usually linguistics dependent, challenging in domain adaptation, and centered around empirical choices based on error patterns generalized from limited annotated datasets. mining based approaches on the other hand are not sufficiently studied and don't ensure ground-truth corrections. Both methodologies also lack focus on other strategical considerations which matter for the final quality of the data and model. We introduce a novel, comprehensive and production-proved distributed mining framework which is able to generate large-scale quality data to train search speller models. The enabling method eliminates dependency on human judged data, and fully scales exploring, training and deploying high-quality speller models with maximal efficiency. The work has been demonstrated by production launches of spell correction to worldwide markets for Apple Maps search. Our approach should also facilitate the general synthetic data generation approaches in applicable domains to get rid of the human annotation dependency.

关键词： scalable machine learning spell correction distributed data mining auto-complete search

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：