In the information age of the 21st century, a large amount of information is collected and applied. However, due to the heterogeneity of system environment for data storage and computing, how to mine these distributed...
详细信息
In the information age of the 21st century, a large amount of information is collected and applied. However, due to the heterogeneity of system environment for data storage and computing, how to mine these distributed data sources has become a valuable research topic that attracted more and more attention. In this paper, we firstly presented the problem scenario and main challenges confronting with the problem of distributed datamining on multiple sourced heterogeneous data sets. Then, we surveyed research works related to the problem and elicited their main features on different technology domains to show current distributed solutions for different datamining algorithm categories. Finally, we reviewed in detail the research works and discussed the challenges remained in the distributed datamining problem for multiple sourced heterogeneous data sets.
The past few years have witnessed increased interest in the potential use of Wireless Sensor Networks (WSNs) in a wide range of applications and it has become a hot research area. Owing to the advances and growth in w...
详细信息
This paper presents the information system of multidimensional data analysis and datamining by identification of associative dependences in multidimensional data, which was implemented in post-relational DBMS Cache E...
详细信息
ISBN:
(纸本)9783319459912;9783319459905
This paper presents the information system of multidimensional data analysis and datamining by identification of associative dependences in multidimensional data, which was implemented in post-relational DBMS Cache Environment. Information system modules have been considered, which perform the next tasks: design of object database on the physical level and its provisioning, construction of multidimensional data structures for creation a database and association rules mining among multidimensional data. Methods of OLAP cubes construction have been considered as well as association rules mining in them which were implemented in the information system.
Encapsulation of information using mathematical barrier for forbidding malicious access is a traditional approach from past to modern era of information technology. Recent advancement in security field is not restrict...
详细信息
ISBN:
(纸本)9789811038747;9789811038730
Encapsulation of information using mathematical barrier for forbidding malicious access is a traditional approach from past to modern era of information technology. Recent advancement in security field is not restricted to the traditional symmetric and asymmetric cryptography;rather, immense security algorithms were proposed in the recent past, from which biometric-based security, steganography, visual cryptography, etc. gained prominent focus within research communities. In this paper, we have proposed a robust cryptographic scheme to original message. First, each message byte, the ASCII characters ranging from Space (ASCII-32) to Tilde (ASCII-126), is represented as object using flat texture in a binary image which is decorated as n by n geometrical-shaped object in images of size N x N. Create a chaotic arrangement pattern by using the prime number encrypted by advanced Encryption Standard (AES). The sub-images are shuffled and united as rows and columns to form a host covert or cipher image which looks like a grid-structured image where each sub-grid represents the coded information. The performance of the proposed method has been analyzed with empirical examples.
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is diffi...
详细信息
ISBN:
(纸本)9781538632215
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business datamining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data.
Diabetes Mellitus is caused due to disorders of metabolism and its one of the most common diseases in the world today, and growing. Threshold Based Clustering Algorithm (TBCA) is applied to medical data received from ...
详细信息
ISBN:
(数字)9789811031564
ISBN:
(纸本)9789811031564;9789811031557
Diabetes Mellitus is caused due to disorders of metabolism and its one of the most common diseases in the world today, and growing. Threshold Based Clustering Algorithm (TBCA) is applied to medical data received from practitioners and presented in this paper. Medical data consist of various attributes. TBCA is formulated to effectually compute impactful attributes related to Mellitus, for further decisions. TBCAs primary focus is on computation of Threshold values, to enhance accuracy of clustering results.
The Mobility prediction is one of the important issues in mobile computing systems. The moving logs of mobile users in mobile computing environment are stored in the Home Location Registry (HLR). The generated moving ...
详细信息
ISBN:
(纸本)9783319563572;9783319563565
The Mobility prediction is one of the important issues in mobile computing systems. The moving logs of mobile users in mobile computing environment are stored in the Home Location Registry (HLR). The generated moving logs are used for mining mobility patterns. The discovered location patterns can be used to provide various location based services to the mobile user by the application server in mobile computing environment. Currently, some papers have written about mobility datamining methods of mobile users in cellular communications networks. In this paper, we propose a method which decrease time to compute the mobility patterns.
Geographical Information System (GIS) stores several types of data collected from several sources in varied format. Thus geo-databases generate day by day a huge volume of data from satellite images and mobile sensors...
详细信息
ISBN:
(纸本)9783319529417;9783319529400
Geographical Information System (GIS) stores several types of data collected from several sources in varied format. Thus geo-databases generate day by day a huge volume of data from satellite images and mobile sensors like GPS, among these data we find in one hand spatial features and geographical data, and in other hand trajectories browsed by several moving objects in some period of time. Merging these types of data leads to produce semantic trajectory data. Enriching trajectories with semantic geographical information lead to facilitate queries, analysis, and mining of moving object data. Therefore applying mining techniques on semantic trajectories continue to proof a success stories in discovering useful and non-trivial behavioral patterns of moving objects. The objective of this paper is to envisage an overview of semantic trajectory knowledge discovery, and spatial datamining approaches for geographic information system. Based on analysis of various literatures, this paper proposes a concept of multi-layer system architecture for raw trajectory construction, trajectory enrichment, and semantic trajectory mining.
Steganography, the study of invisible communication, deals with ways of hiding the existence of the communicated data in unsuspected digital media, such that it remains confidential. The objectives to be considered in...
详细信息
ISBN:
(纸本)9781467385947
Steganography, the study of invisible communication, deals with ways of hiding the existence of the communicated data in unsuspected digital media, such that it remains confidential. The objectives to be considered in the steganography methods are high capacity, imperceptibility and robustness. In this paper, a new data hiding scheme based on magic square blocks is proposed to obtain better image quality and higher embedding capacity while scrambling the secret image using magic square provides security. In the proposed method, a secret digit is embedded into each cover pixel pair with the help of a reference matrix consisting of connected magic square blocks. The experimental results evaluated on 6 cover images show that the new scheme can enhance the security significantly compared with other spatial domain based approaches preserving higher visual quality of stego images at the same time.
This article, written by Special Publications Editor Adam Wilson, contains highlights of paper SPE 181024, “Recovery-Factor Prediction for Deepwater Gulf of Mexico Oil Fields by Integration of Dimensionless Numbers W...
详细信息
This article, written by Special Publications Editor Adam Wilson, contains highlights of paper SPE 181024, “Recovery-Factor Prediction for Deepwater Gulf of Mexico Oil Fields by Integration of Dimensionless Numbers With data-mining Techniques,” by Priyank Srivastava and Xingru Wu, SPE, University of Oklahoma, and Amin Amirlatifi, SPE, Mississippi State University, prepared for the 2016 SPE Intelligent Energy internationalconference and Exhibition, Aberdeen, 6–8 September. The paper has not been peer reviewed. Using attributes from a database of 395 deepwater Gulf of Mexico oil fields, a set of dimensionless numbers is calculated that helps in scaling attributes for all the oil fields. On the basis of these dimensionless numbers, various data-mining techniques are used to classify the oil fields. Subsequently, partial-least-square (PLS) regression is used to relate the dimensionless numbers to the recovery factor. This study shows that dimensionless numbers, together with data-mining techniques, can predict field behavior in terms of recovery factor for sparse data *** The digitization of information and the rise of inexpensive sensor technologies have ushered in a new era of computing in which acquired data are used to show hidden patterns and trends. This method of computing is very efficient in solving inverse problems where parameters affecting system characteristics are not completely known. Hydrocarbon reservoirs provide a classic case of a natural system where engineers have limited control on the design of the system that they work with; thus, they have to rely on indirect measurements to determine properties of the reservoir and use these properties for prediction of future trends. Performance prediction is usually accomplished with either analytical material-balance equations or numerical reservoir simulation. However, both methods use a bottom-up work flow, which suffers from a drawback: the need for accurate representation of subsurface geo
暂无评论