Encrypted databasessystems and searchable encryption schemes still leak critical information (e.g.: access patterns) and require a choice between privacy and efficiency. We show that using ORAM schemes as a black-box...
详细信息
ISBN:
(纸本)9781665438193
Encrypted databasessystems and searchable encryption schemes still leak critical information (e.g.: access patterns) and require a choice between privacy and efficiency. We show that using ORAM schemes as a black-box is not a panacea and that optimizations are still possible by improving the data structures. We design an ORAM-based secure database that is built from the ground up: we replicate the typical data structure of a database system using different optimized ORAM constructions and derive a new solution for oblivious searches on databases. Our construction has a lower bandwidth overhead than state-of-the-art ORAM constructions by moving client-side computations to a proxy with an intermediate (rigorously defined) level of trust, instantiated as a server-side isolated execution environment. We formally prove the security of our construction and show that its access patterns depend only on public information. We also provide an implementation compatible with SQL databases (PostgresSQL). Our system is 1.2 times to 4 times faster than state-of-the-art ORAM-based solutions.
A novel, high-perfrmance subsystem for information retrieval called JAS is introduced. JAS sustains Gbyte/s comparison rates due to its 'early-out' CMOS substring search processor. The complexity of each JAS u...
详细信息
ISBN:
(纸本)0818608935
A novel, high-perfrmance subsystem for information retrieval called JAS is introduced. JAS sustains Gbyte/s comparison rates due to its 'early-out' CMOS substring search processor. The complexity of each JAS unit is independent of the complexity of the query. This independence is achieved by utilizing a decoupled instruction set. The JAS architecture is described, and its performance is evaluated.
This short paper describes the cooperative caching architecture of pCFS [5], a shared disk cluster file system (CFS) which aims to achieve high performance in a broad spectrum of I/O intensive applications ranging fro...
详细信息
ISBN:
(纸本)1424403073
This short paper describes the cooperative caching architecture of pCFS [5], a shared disk cluster file system (CFS) which aims to achieve high performance in a broad spectrum of I/O intensive applications ranging from computational access to large data sets to video streaming and databases, and includes an extended API for parallel I/O access. pCFS is targeted at small to medium sized clusters where data is stored in Fibre Channel shared devices on a Storage Area Network (SAN) and exploits two interconnect fabrics: a SAN to access on-disk data, and a LAN, used both for the exchange of control information (related to locking and cache management) and for cooperative caching dataflow.
The following topics are dealt with: access methods;distributed operating systems and databases;database design and implementation;performance evaluation;architectural support for database management;evaluating recurs...
详细信息
ISBN:
(纸本)0818607629
The following topics are dealt with: access methods;distributed operating systems and databases;database design and implementation;performance evaluation;architectural support for database management;evaluating recursive queries;file structures;parallel processing database systems;object-based systems;performance in distributedsystems;improving concurrency in distributedsystems;fault tolerance and correctness;knowledge representation;resiliency in distributedsystems;fault-tolerant storage systems;data modeling;historical databases;extending the relational model;CAD/CAM systems;and query processing. 85 papers are published in the present proceedings.
New biological experimental techniques are continuing to generate large amounts of data using DNA, RNA, human genome and protein sequences. The quantity and quality of data from these experiments makes analyses of the...
详细信息
ISBN:
(纸本)0769522815
New biological experimental techniques are continuing to generate large amounts of data using DNA, RNA, human genome and protein sequences. The quantity and quality of data from these experiments makes analyses of their results very time-consuming, expensive and impractical. Searching on DNA and protein databases using sequence comparison algorithms has become one of the most powerful techniques to better understand the functionality of particular DNA, RNA, genome, or protein sequence. This paper presents a technique to effectively combine fine and coarse grain parallelism using general purpose processors for sequence homology database searches. The results show that the classic Smith-Waterman sequence alignment algorithm achieves super linear performance with proper scheduling and multi-level parallel computing at no additional cost.
This paper describes a new approach to the scheduling problem that assigns tasks of a parallel program described as a task graph onto parallel machines. The approach handles interprocessor communication and heterogene...
详细信息
This paper describes a new approach to the scheduling problem that assigns tasks of a parallel program described as a task graph onto parallel machines. The approach handles interprocessor communication and heterogeneity, based on using both the theoretical results developed so far and a lookahead scheduling strategy. The experimental results on randomly generated task graphs demonstrate the effectiveness of this scheduling heuristic.
While large-scale scientific experiments and simulations produce massive amounts of data, a small fraction of data contains useful information. Efficient querying on such volume of data to extract that information inc...
详细信息
ISBN:
(纸本)9781728174457
While large-scale scientific experiments and simulations produce massive amounts of data, a small fraction of data contains useful information. Efficient querying on such volume of data to extract that information increases the productivity of the scientific discovery process. Although querying has been explored extensively in relational databases, research and adoption of querying tools for scientific data that is stored in parallel file systems on high performance computing (HPC) systems are still in infancy. In this paper, we introduce a parallel query service, called PDC-Query, for an object data management systems (ODMS) on HPC systems. It operates on partitioned objects in parallel, and provides several optimization strategies for fast query evaluation. The ODMS paradigm for HPC systems is promising in reducing the burden on users in data management and in moving data transparently across the deep memory hierarchy in modern HPC systems. We propose a 'global histogram'-based approach to accelerate query evaluation, through selectivity estimation and reducing the amount of data that needs to be loaded from storage and processed. We compare querying performance and demonstrate the efficiency and scalability of different approaches PDC-Query supports, including using global histograms, bitmap indexes, sorting, and full scan, in performing various queries on top of a plasma physics dataset with 125 billion particles and an astronomy dataset with 25 million objects.
Results on how to place a limited number of resources in two dimensional torus-based parallelsystems are described. The resources are placed so that every non-resource node is within a given distance d from some reso...
详细信息
ISBN:
(纸本)0818684038
Results on how to place a limited number of resources in two dimensional torus-based parallelsystems are described. The resources are placed so that every non-resource node is within a given distance d from some resource node. It is proved that the proposed methods are optimal in terms of reducing the maximum distance between the resource and the non-resource nodes. Simulation results show that the proposed methods are superior to the existing methods in terms of the average message latency.
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data th...
详细信息
ISBN:
(纸本)0769523811
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid.
Heterogeneity is becoming quite common in distributedparallel computing systems, both in processor architectures and in communication networks. Different types of networks have different performance characteristics, ...
详细信息
ISBN:
(纸本)0818681187
Heterogeneity is becoming quite common in distributedparallel computing systems, both in processor architectures and in communication networks. Different types of networks have different performance characteristics, while different types of messages may have different communication requirements. In this work, we analyze two techniques for exploiting these heterogeneous characteristics and requirements to reduce the communication overhead of parallel application programs executed an distributed computing systems. The performance based path selection (PBPS) technique selects the best (lowest latency) network among several for each individual message, while the second technique aggregates multiple networks into a single virtual network. We present a general approach for applying and evaluating these techniques to a distributed computing system with multiple interprocessor communication networks, We also generate performance curves for a cluster of IBM workstations interconnected with Ethernet, ATM, and Fibre Channel networks. As we show with several of the NAS benchmarks, these curves can be used to estimate the potential improvement in communication performance that can be obtained with these: techniques, given some simple communication characteristics of an application program.
暂无评论