In this demonstration, we will showcase Myria, our novel cloud service for big data management and analytics designed to improve productivity. Myria's goal is for users to simply upload their data and for the syst...
详细信息
ISBN:
(纸本)9781450323765
In this demonstration, we will showcase Myria, our novel cloud service for big data management and analytics designed to improve productivity. Myria's goal is for users to simply upload their data and for the system to help them be self-sufficient data science experts on their data - self-serve analytics. From a web browser, Myria users can upload data, author efficient queries to process and explore the data, and debug correctness and performance issues. Myria queries are executed on a scalable, parallel cluster that uses both state-of-the-art and novel methods for distributed query processing. Our interactive demonstration will guide visitors through an exploration of several key Myria features by interfacing with the live system to analyze big datasets over the web.
A key important issue for telecom management is the delay in comprehending network conditions. This delay harms precise network management activities because the amount of managed data required for performance managem...
详细信息
ISBN:
(纸本)9781479909131
A key important issue for telecom management is the delay in comprehending network conditions. This delay harms precise network management activities because the amount of managed data required for performance management has been increasing due to emerging technology such as the virtualization tailoring dedicated network service for an enterprise. Hence, the increase in managed data is accelerating, and in future, we will see a fatal limitation of the current management approach. Timely monitoring of failure is increasingly valuable. To tackle this issue, a decentralized performance management system is assumed, and a real-time performance monitoring implemented inside distributed network devices is proposed. In the system the demonstration result shows that burst traffic was successfully detected at 250 millisecond intervals, and a detection resolution of its failure is significantly improved, compared with the conventional centralized management approach.
Instant message systems adopt a distributed network architecture, which consists of core servers, park servers and terminal users. The servers need to maintain the data consistency and the mutual backup. Firstly, this...
详细信息
ISBN:
(纸本)9781479954582
Instant message systems adopt a distributed network architecture, which consists of core servers, park servers and terminal users. The servers need to maintain the data consistency and the mutual backup. Firstly, this paper discusses the JDBC fundamental, which includes the JDBC interface technology, the JDBC architecture, the JDBC drivers and the JDBC connection pool. Secondly, it describes the core classes and the JDBC middleware implementation. In order to solve transmission problems among heterogeneous data sources, this paper proposes the operation mode and the construction method of the JDBC-based middleware to access to heterogeneous distributed databases.
Publishing data about individuals without revealing sensitive information about them is an important problem. distributed data mining applications use sensitive data from distributed databases held by different partie...
详细信息
ISBN:
(纸本)9783319029313;9783319029306
Publishing data about individuals without revealing sensitive information about them is an important problem. distributed data mining applications use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual's need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining. Here, we study how to maintain privacy in distributed data mining. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party's portion of the data to the other. In this paper, we consider privacy-preserving na ve-Bayes classifier for horizontally partitioned distributed data and propose data mining privacy by decomposition (DMPD) method that uses genetic algorithm to search for optimal feature set partitioning by classification accuracy and k-anonymity constraints.
Social media is an increasingly popular method for people to share information and interact with each other. Analysis of social media data has the potential to provide useful insights in a wide range of domains includ...
详细信息
ISBN:
(纸本)9781479950690
Social media is an increasingly popular method for people to share information and interact with each other. Analysis of social media data has the potential to provide useful insights in a wide range of domains including social science, advertising and policing. Social media information is produced in real-time, and so analysis that can give insights into events as they occur can be particularly valuable. Similarly, analytics platforms providing low latency query responses can improve the user experience for ad-hoc data exploration on historic data sets. However, the rate at which new data is generated makes it a real challenge to design a system that can meet both of these challenges. This paper describes the deisgn and evaluation of such a system. Firstly, it describes how a meta-analysis of the types of questions that were being asked of Twitter data led to the identification of a small set of queries that could be used to answer the majority of them. Secondly, it describes the design of a scalable platform for answering these and other queries. The architecture is described: it is cloud-based, and combines both continuous query, and noSQL database technology. Evaluation results are presented which show that the system can scale to process queries on streaming data arriving at the rate of the full Twitter firehose. Experiments show that queries on large repositories of stored historic data can also be answered with low latency. Finally, we present the results of queries that combine both streaming and historic data.
Main Memory database Systems (MMDBs) have been studied since the 80s [3,4], when memory was quite costly ($1500 per MByte in 1984). We can now buy memory for about $10 per GByte. An advantage of MMDBs is that serial e...
详细信息
ISBN:
(纸本)9781450326278
Main Memory database Systems (MMDBs) have been studied since the 80s [3,4], when memory was quite costly ($1500 per MByte in 1984). We can now buy memory for about $10 per GByte. An advantage of MMDBs is that serial execution of a non-distributed transaction on a uniprocessor from start to finish saves the work of disk I/O, locking, latching and deadlock handling [7]. The 2013 Bulletin on Data Engineering [11] had eight articles on recent MMDBs and only three mentioned distributed transactions. Implementing fast, serializable, distributed transactions on an MMDB is difficult, since communication delays typically leave some CPUs idle and reduce total throughput. We began a project in Fall 2011 to improve distributed transactional performance of the open-source MMDB VoltDB system [14], which was based on an earlier academic prototype MMDB H-Store [1]. We developed a low-overhead concurrency method that executes consecutive Prepares with delayed Commits on each node (CPU) and takes Write locks but not Read locks to detect conflicts. We developed an Ordered Escrow Method, a variant of Escrow [14] to greatly speed up transactions with incremental updates. We named our VoltDB modification CVoltDB (C for Concurrency) and proved it supports replica consistency and serializability. Full TPC-B and TPC-C benchmarks demonstrate greatly improved performance due to new features in CVoltDB.
The main goal of data mining is to extract useful information from large amounts of database. However, data is often collected by several different sites. Among these, association rule mining has wide applications to ...
详细信息
ISBN:
(纸本)9781509000777
The main goal of data mining is to extract useful information from large amounts of database. However, data is often collected by several different sites. Among these, association rule mining has wide applications to find interesting relationships among attributes. In this paper we use concept of distributed database, when divided the centralize database into distributed database environment, database may be partitioned in different ways such as horizontally partitioned, vertically partitioned and mixed mode. The papers presents privacy preserving data mining algorithms operating over vertically partitioned database using the concepts of distribution privacy preservation and also reduce the time and space complexity with zero percentage of data leakage.
High performance low cost PC hardware, and high speed LAN/WAN technologies make distributed database(DDB) systems an attractive research area. Since Dynamic programming is not feasible for optimizing queries in a DDB,...
详细信息
ISBN:
(纸本)9781424450213
High performance low cost PC hardware, and high speed LAN/WAN technologies make distributed database(DDB) systems an attractive research area. Since Dynamic programming is not feasible for optimizing queries in a DDB, we propose a GA based query optimizer and compare its performance to random and optimal algorithms. We analyzed a set of possible GA parameters and determined that two-point truncate technique using GA gives the best results. New mutation and crossover operators have also been defined and experimentally analyzed. We performed experiments on a synthetic database with replicated relations, but no horizontal or vertical fragmentation. Network links are assumed to be gigabit Ethernet. Comparisons with optimal results found by exhaustive search show that our new GA formulation performs only 20% off the optimal results and we have achieved a 50% improvement over a previous GA based algorithm.
This paper will examine the underlying features of the distributed database architecture .Learning the task of distributed database management system will lead us to a successful design. The design will improve scalab...
详细信息
ISBN:
(纸本)9781424437566
This paper will examine the underlying features of the distributed database architecture .Learning the task of distributed database management system will lead us to a successful design. The design will improve scalability, accessibility and flexibility while accessing various types of data. Developing a successful distributed database system requires to address the importance of security issues that may arise and possibly compromise the access control and the integrity of the system. We propose some solutions for some security aspects such as multilevel access control, confidentiality, reliability, integrity and recovery that pertain to a distributed database system.
Multilevel security requirements introduce a new dimension to traditional database schedulers as they cause covert channels. To prevent covert channels, scheduler for multilevel secure database should ensure that tran...
详细信息
ISBN:
(纸本)9781424429271
Multilevel security requirements introduce a new dimension to traditional database schedulers as they cause covert channels. To prevent covert channels, scheduler for multilevel secure database should ensure that transactions at low security level are never delayed by high security level transactions in the event of a data conflict. This may subjected to an indefinite delay if it is forced to abort repeatedly to high security level transactions and making the secure scheduler unfair towards high security level transactions [12]. This paper proposes secure database scheduler that is based on both optimistic and locking techniques (SO2PL) for multilevel secure distributed database systems. The proposed database scheduler is free from covert channels without starving the high security level transactions. Through a simulation study we evaluate the performance of the SO2PL and compare it with S2PL scheduler.
暂无评论