In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidential...
详细信息
In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.
data Mining is concerned with extraction of interesting patterns or knowledge from huge amounts of data. Generally data mining tasks are either predictive or descriptive. Classification falls under predictive inductio...
详细信息
data Mining is concerned with extraction of interesting patterns or knowledge from huge amounts of data. Generally data mining tasks are either predictive or descriptive. Classification falls under predictive induction while clustering and association rule mining fall under descriptive induction. Subgroup discovery is a task at the intersection of supervised learning and descriptive induction. In subgroup discovery we want to uncover individual patterns in data with a given property of interest. We want to find subgroups that cover a large population and are statistically different. The main application areas of subgroup discovery are exploration and descriptive induction, where the user wants to find the overview of dependencies between a target and many explaining variables. Many techniques have been proposed for discovering subgroups and some of these techniques are based on classification. But none of the techniques uses Bayesian networks for the generation of subgroups. Our contributions include a technique for the discovery of subgroups where the subgroups are generated using Bayesian networks.
Many applications like video surveillance, telecommunication, weather forecasting and sensor networks uses high volume of data of different types. The effective and efficient analysis of data in such different forms b...
详细信息
Many applications like video surveillance, telecommunication, weather forecasting and sensor networks uses high volume of data of different types. The effective and efficient analysis of data in such different forms becomes a challenging task. Analysis of such large expression data gives rise to a number of new computational challenges not only due to the increase in number of data objects but also due to the increase in number of attributes. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by an efficient dimensionality reduction method. In this paper, we have proposed to use the method of k-means clustering and principal component analysis (PCA) approach for attribute reduction, which initially apply PCA to obtain reduced uncorrelated attributes specifying maximal eigenvalues in the dataset with minimum loss of information. Then again we proposed to use k-means on the PCA reduced dataset to discover discriminative features that will be the most adequate ones for classification. This is a combination of clustering approach with feature reduction to obtain a minimal set attributes retaining a suitably high accuracy in representing the original features. We have used the Greengram agricultural data set. Finally, we found that the result of clustering is same after reducing the attributes using PCA.
Ridesharing has the great opportunity to reduce the consumption of energy and the emission of harmful gases, and to let people share the traffic costs with others. Most of the current ridesharing systems simply provid...
详细信息
Ridesharing has the great opportunity to reduce the consumption of energy and the emission of harmful gases, and to let people share the traffic costs with others. Most of the current ridesharing systems simply provide a number of candidates for users to choose. Time-consuming negotiation often discourages people from ridesharing. We propose a novel approach that assigns users to form ridesharing groups according to their routes and payments. Given a driver, our goal is to find a group of passengers who will pay the driver the most. Under the payment scheme, the passengers who share rides on the same route will equally share the expense with the driver. For the prompt response to an online system, our approach aims for the near-optimal group, where the available seats on the driver route are occupied by passengers as many as possible. Compared with the previous methods, the experiment results show that our approach incurs a little overhead but obtains answers of good quality, measured by the driver's saving, under various parameter settings.
MicroRNAs can regulate hundreds of target genes and play a pivotal role in a broad range of biological process. However, relatively little is known about how these highly connected miRNAs-target networks are remodelle...
详细信息
MicroRNAs can regulate hundreds of target genes and play a pivotal role in a broad range of biological process. However, relatively little is known about how these highly connected miRNAs-target networks are remodelled in the context of various diseases. Here we examine the dynamic alteration of context-specific miRNA regulation to determine whether modified microRNAs regulation on specific biological processes is a useful information source for predicting cancer prognosis. A new concept, Context-specific miRNA activity (CoMi activity) is introduced to describe the statistical difference between the expression level of a miRNA's target genes and non-targets genes within a given gene set (context).
Web applications are increasing at an enormous speed and its users are increasing at exponential speed. The evolutionary changes in technology have made it possible to capture the users' essence and interactions w...
详细信息
Web applications are increasing at an enormous speed and its users are increasing at exponential speed. The evolutionary changes in technology have made it possible to capture the users' essence and interactions with web applications through web server log file. Web log file is saved as text (.txt) file. Due to large amount of “irrelevant information” in the web log, the original log file can not be directly used in the web usage mining (WUM) procedure. Therefore the preprocessing of web log file becomes imperative. The proper analysis of web log file is beneficial to manage the web sites effectively for administrative and users' prospective. Web log preprocessing is initial necessary step to improve the quality and efficiency of the later steps of WUM. There are number of techniques available at preprocessing level of WUM. Different techniques are applied at preprocessing level such as data cleaning, data filtering, and data integration. In this paper, we will survey the preprocessing techniques to identify the issues and how WUM preprocessing can be improved for pattern mining and analysis.
Certificateless public key cryptography (CLPKC) eliminates certificate management in traditional public key infrastructure and solves the problem of the key escrow in identity-based cryptography. Certificateless signa...
详细信息
Privacy Preserving in data Mining (PPDM) is a process by which certain sensitive information is hidden during data mining without precise access to original dataset. Majority of the techniques proposed in the literatu...
详细信息
Privacy Preserving in data Mining (PPDM) is a process by which certain sensitive information is hidden during data mining without precise access to original dataset. Majority of the techniques proposed in the literature for hiding sensitive information are based on using Support and Confidence measures in the association rules, which suffer from limitations. In this paper we propose a novel architecture which acquired other standard statistical measures instead of conventional framework of Support and Confidence to generate association rules. Specifically a weighing mechanism based on central tendency is introduced. The proposed architecture is tested with UCI datasets to hide the sensitive association rules as experimental evaluation. A performance comparison is made between the new technique and the existing one. The new architecture generates no ghost rules with complete avoidance of failure in hiding sensitive association rules. We demonstrate that Support and Confidence are not the only measures in hiding sensitive association rules. This research is aimed to contribute to data mining areas where privacy preservation is a concern.
In Web Usage Mining (WUM), web session clustering plays a key role to classify web visitors on the basis of user click history and similarity measure. Swarm based web session clustering helps in many ways to manage th...
详细信息
In Web Usage Mining (WUM), web session clustering plays a key role to classify web visitors on the basis of user click history and similarity measure. Swarm based web session clustering helps in many ways to manage the web resources effectively such as web personalization, schema modification, website modification and web server performance. In this paper, we propose a framework for web session clustering at preprocessing level of web usage mining. The framework will cover the data preprocessing steps to prepare the web log data and convert the categorical web log data into numerical data. A session vector is obtained, so that appropriate similarity and swarm optimization could be applied to cluster the web log data. The hierarchical cluster based approach will enhance the existing web session techniques for more structured information about the user sessions.
暂无评论