As establishing fraud detection mechanism in the healthcare industry is an evolving challenge, this research work proposes a comprehensive approach for predicting the most probable fraudulent claims by the help of tra...
详细信息
The proceeding is a collection of research papers presentedat the internationalconference on data Engineering 2013 (DaEng-2013), a conference dedicated to address the challenges in the areas of database, information ...
ISBN:
(纸本)9789811013409
The proceeding is a collection of research papers presentedat the internationalconference on data Engineering 2013 (DaEng-2013), a conference dedicated to address the challenges in the areas of database, information retrieval, datamining and knowledge management, thereby presenting a consolidated view to the interested researchers in the aforesaid fields. The goal of this conference was to bring together researchers and practitioners from academia and industry to focus on advanced on data engineering concepts and establishing new collaborations in these areas. The topics of interest are as follows but are not limited to: database theory data management datamining and warehousing data privacy & security Information retrieval, integration and visualization Information system Knowledge discovery in databases Mobile, grid and cloud computing Knowledge-based Knowledge management Web data, services and intelligence
Optimizing the real-life scenarios facilitate knowledge building. Developing a knowledge model for optimizing certain output criteria enhances the benefits by many folds. Even a non-profit sector like education needs ...
详细信息
Co-location patterns or subsets of spatial features, whose instances are frequently located together, are particularly valuable for discovering spatial dependencies. Although lots of spatial co-location pattern mining...
详细信息
ISBN:
(纸本)9783319914527;9783319914510
Co-location patterns or subsets of spatial features, whose instances are frequently located together, are particularly valuable for discovering spatial dependencies. Although lots of spatial co-location pattern mining approaches have been proposed, the computational cost is still expensive. In this paper, we propose an iterative mining framework based on MapReduce to mine co-location patterns efficiently from massive spatial data. Our approach searches for co-location patterns in parallel through expanding ordered cliques and there is no candidate set generated. A large number of experimental results on synthetic and real-world datasets show that the proposed method is efficient and scalable for massive spatial data, and is faster than other parallel methods.
For MapReduce jobs in data center, network traffic is generated in shuffling phase, causing east-west communication bottleneck. Aiming at this problem, an optimization scheme is proposed to aggregate relevant network ...
详细信息
ISBN:
(纸本)9781538680346
For MapReduce jobs in data center, network traffic is generated in shuffling phase, causing east-west communication bottleneck. Aiming at this problem, an optimization scheme is proposed to aggregate relevant network traffic flows into local areas. Firstly, the characteristics of pre-scheduling of MapReduce jobs are extracted, the communication activity degree of jobs is defined and the computing jobs are divided into two types: active or inactive communication job. Then the Bayesian classification with active learning is used as the prediction model, and this model after training by sample data can determine job type. The active communication jobs are deployed in the same rack to improve network bandwidth utilization. The experiment results of small-scale data center show that the proposed communication optimization scheme has a significant effect on shuffling intensive jobs, reaching 4.2%-5.6%. In the case of larger amount of input data, this scheme has better robustness and can effectively reduce east-west communication delay in data center.
The explosion of scientific data generated from large-scale simulations and advanced sensors makes scientific workflows more complex and more data-intensive. Supporting these data-intensive workflows on HPC systems pr...
详细信息
ISBN:
(纸本)9781538666142
The explosion of scientific data generated from large-scale simulations and advanced sensors makes scientific workflows more complex and more data-intensive. Supporting these data-intensive workflows on HPC systems presents new challenges in data management due to their scales, coordination behaviors and overall complexities. In this paper, we present Tiered data Management System (TDMS) to accelerate scientific workflows on tiered storage architecture. TDMS utilize the throughput and capacity characteristics of each storage tier for efficient data sharing. Moreover, TDMS provides customized data management strategies for different workflow data access patterns to make full use of advantages of different storage tiers. We build a prototype and deploy it on representative HPC system. We evaluate the performance of TDMS with realistic workflows and the experiments show that the customized data management strategies can optimize the I/O performance and provide 1.6x speedup for data-intensive workflows compared with Lustre file system.
The datamining(1) standard process divides a datamining project into six phases, i.e. business understanding, data understanding, data preparation, modeling, evaluation and deployment. The goal of the data understan...
详细信息
ISBN:
(纸本)9781450365123
The datamining(1) standard process divides a datamining project into six phases, i.e. business understanding, data understanding, data preparation, modeling, evaluation and deployment. The goal of the data understanding phase is to understand the original data. At present, there are relatively few studies on this phase. In practical applications, some visualization methods are usually used to understand the original data. Therefore, we propose a systematic process for data understanding, and make full use of visualization technology to help users understand the data. In addition, we revise the DP (Density Peaks) algorithm to identify the high-density region, and integrate it into the data understanding process. The experimental results show that the data understanding process proposed in this paper is effective.
This book constitutes the refereed proceedings of the 9th IFIP TC 12 internationalconference on Intelligent Information Processing, IIP 2016, held in Melbourne, VIC, Australia, in October 2016. The 24 full papers and...
ISBN:
(纸本)9783319839301
This book constitutes the refereed proceedings of the 9th IFIP TC 12 internationalconference on Intelligent Information Processing, IIP 2016, held in Melbourne, VIC, Australia, in October 2016. The 24 full papers and 3 short papers presented were carefully reviewed and selected from more than 40 submissions. They are organized in topical sections on machine learning, datamining, deep learning, social computing, semantic web and text processing, image understanding, and brain-machine collaboration.
Recently, datamining developed fast and attracted a lot of attention. When using datamining in real world, privacy protection is an important problem. Over the past ten years, many researchers study this problem and...
详细信息
Secured data transmission is one of the real issues faced in the world of Web. As the measure of data on Web is expanding daily, the importance of data security is also increasing. Several techniques like cryptography...
详细信息
ISBN:
(纸本)9789811068720;9789811068713
Secured data transmission is one of the real issues faced in the world of Web. As the measure of data on Web is expanding daily, the importance of data security is also increasing. Several techniques like cryptography, watermarking, steganography are used to enhance the data to be transmitted. This paper uses a novel steganographic algorithm in the spatial domain using the concept of pixel modulation which diminishes the changes that occur in the stego image generated from the cover image. Experimental results and analysis of the observations show the effectiveness of the proposed algorithm. Different metrics like mean square error (MSE), peak to signal ratio (PSNR), bit-plane analysis, and histogram analysis have been used to show the better results of the proposed algorithm over the existing ones.
暂无评论