With computing systems undergone a fundamen- tal transformation from single-processor devices at the turn of the century to the ubiquitous and networked devices and the warehouse-scale computing via the cloud, the par...
详细信息
With computing systems undergone a fundamen- tal transformation from single-processor devices at the turn of the century to the ubiquitous and networked devices and the warehouse-scale computing via the cloud, the parallelism has become ubiquitous at many levels. At micro level, par- allelisms are being explored from the underlying circuits, to pipelining and instruction level parallelism on multi-cores or many cores on a chip as well as in a machine. From macro level, parallelisms are being promoted from multiple ma- chines on a rack, many racks in a data center, to the glob- ally shared infrastructure of the Internet. With the push of big data, we are entering a new era of parallel computing driven by novel and ground breaking research innovation on elas- tic parallelism and scalability. In this paper, we will give an overview of computing infrastructure for big data processing, focusing on architectural, storage and networking challenges of supporting big data paper. We will briefly discuss emerging computing infrastructure and technologies that are promising for improving data parallelism, task parallelism and encour- aging vertical and horizontal computation parallelism.
This tutorial presents the definition, the models and the techniques of location privacy from the data privacy perspective. By reviewing and revising the state of art research in data privacy area, the presenter descr...
详细信息
ISBN:
(纸本)9781595936493
This tutorial presents the definition, the models and the techniques of location privacy from the data privacy perspective. By reviewing and revising the state of art research in data privacy area, the presenter describes the essential concepts, the alternative models, and the suite of techniques for providing location privacy in mobile and ubiquitous data management systems. The tutorial consists of two main components. First, we will introduce location privacy threats and give an overview of the state of art research in data privacy and analyze the applicability of the existing data privacy techniques to location privacy problems. Second, we will present the various location privacy models and techniques effective in either the privacy policy based framework or the location anonymization based framework. The discussion will address a number of important issues in both data privacy and location privacy research, including the location utility and location privacy trade-offs, the need for a careful combination of policy-based location privacy mechanisms and location anonymization based privacy schemes, as well as the set of safeguards for secure transmission, use and storage of location information, reducing the risks of unauthorized disclosure of location information. The tutorial is designed to be self-contained, and gives the essential background for anyone interested in learning about the concept and models of location privacy, and the principles and techniques for design and development of a secure and customizable architecture for privacy-preserving mobile data management in mobile and pervasive information systems. This tutorial is accessible to data management administrators, mobile location based service developers, and graduate students and researchers who are interested in data management in mobile information syhhhstems, pervasive computing, and data privacy. Copyright 2007 VLDB Endowment, ACM.
Within collaborative computing, computer mediated communications are evolving rapidly thanks to the development of new technologies. The facilitation of awareness and discovery of users in the communications networks ...
详细信息
ISBN:
(纸本)9789639799769
Within collaborative computing, computer mediated communications are evolving rapidly thanks to the development of new technologies. The facilitation of awareness and discovery of users in the communications networks is a key requirement for the success of these collaborative systems. Besides the need for location awareness, the emergence of heterogeneous wireless environments, where users can freely roam, is making Location Management (LM) an increasingly important topic for network operators. In this paper, we use a general model for LM signaling costs to obtain analytical expressions for their optimization. These expressions are applicable to different LM algorithms and scenarios, contributing towards the development of a standardized performance evaluation technique and to deliver guidelines for the optimum design of Location Areas (LAs). We also illustrate how modifications in the different parameters involved in the LM costs affect the optimum number of cells per LA and the value of the optimum LM costs.
Summary form only given. As data scientists, we live in interesting times. data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data analytics have the potential to reveal deep insi...
详细信息
Summary form only given. As data scientists, we live in interesting times. data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data analytics have the potential to reveal deep insights hidden by big data that exceeds the processing capacity of existing systems, such as peer influence among customers, revealed by analyzing shoppers' transactions, social and geographical data. In the past 40 years, data was primarily used to record and report business activities and scientific events, and in the next 40 years data will be used also to derive new insights, to influence business decisions and to accelerate scientific discovery. The key challenge is to provide the right platforms and tools to make reasoning of big data easy and simple. In this keynote talk, I will explore reuse opportunities and challenges from multiple dimensions towards delivering big data analytics as a service. I will illustrate by example the importance and challenges of utilizing programmable algorithm abstractions for many seemingly domain-dependent data analytics tasks. Another reuse opportunity is to exploit unconventional data structures and big data processing constructs to simplify and speed up the big data processing.
Huge amount of entities and theirs relationships are posted on the Web. Those entities and theirs relationship networks help many activities. In this paper, we focus on the task of extracting academic entity network f...
详细信息
Huge amount of entities and theirs relationships are posted on the Web. Those entities and theirs relationship networks help many activities. In this paper, we focus on the task of extracting academic entity network from homepages. Homepages usually contain many entities, such as persons, conference/journal and organization and theirs relationship. However, homepages don't follow a unified layout format and often contains similar information, but differs greatly in layouts and styles, which makes it impossible to use a unified set of rules to handle them all. Thus we propose an integrated approach to automatically extract data from unstructured texts. The main idea of the approach is to adopt the most suitable approach to extract entities. Thus, the approach is self-adaptive. Firstly, the approach decomposes web pages into text units and then classifier is used to determine units' type. Once the units' types are known, the different technologies are chosen to deal with them. For example, edit distance and inverted index are used to identify names etc. And Conditional Random Field technology is considered the best solution to extract publication entries. The result shows that LineX has achieved high performance on extracting entities from web pages in academic community.
NoSQL systems have become the vital components to deliver big data services in the Cloud. However, existing NoSQL systems rely on experienced administrators to configure and tune the wide range of configurable paramet...
详细信息
NoSQL systems have become the vital components to deliver big data services in the Cloud. However, existing NoSQL systems rely on experienced administrators to configure and tune the wide range of configurable parameters in order to achieve high performance. In this paper, we present a policy-driven configuration management system for NoSQL systems, called PCM. PCM can identify workload sensitive configuration parameters and capture the tuned parameters for different workloads as configuration policies. PCM also can be used to analyze the range of configuration parameters that may impact on the runtime performance of NoSQL systems in terms of read and write workloads. The configuration optimization recommended by PCM can enable NoSQL systems such as HBase to run much more efficiently than the default settings for both individual worker node and entire cluster in the Cloud. Our experimental results show that HBase under the PCM configuration outperforms the default configuration and some simple configurations on a range of workloads with offering significantly higher throughput.
暂无评论