Similarity-oriented services serve as a foundation in a wide range of data analytic applications such as machine learning, target advertising, and real-time decisions. Both industry and academia strive for efficient a...
详细信息
ISBN:
(纸本)9781538627044
Similarity-oriented services serve as a foundation in a wide range of data analytic applications such as machine learning, target advertising, and real-time decisions. Both industry and academia strive for efficient and scalable similarity discovery and querying techniques to handle massive, complex data records in the real world. In addition to performance, data security and privacy become an indispensable criterion in the quality of service due to progressively increased data breaches. To address this serious concern, in this paper, we propose and implement "EncSIM", an encrypted and scalable similarity search service. The architecture of EncSIM enables parallel query processing over distributed, encrypted data records. To reduce client overhead, EncSIM resorts to a variant of the state-of-the-art similarity search algorithm, called all-pairs locality-sensitive hashing (LSH). We describe a novel encrypted index construction for EncSIM based on searchable encryption to guarantee the security of service while preserving performance benefits of all-pairs LSH. Moreover, EncSIM supports data record addition with a strong security notion. Intensive evaluations on a cluster of Redis demonstrate low client cost, linear scalability, and satisfied query performance of EncSIM.
This paper presents an operational semantics of the repetitive model of computation, which is the basis for the repetitive structure modeling (RSM) package defined in the standard UML Marte profile. It also deals with...
详细信息
Pairwise Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, in a daily basis. The exact methods proposed in the literature have quadratic time complexity. For this reason, ...
详细信息
Due to the rising demand for large-scale data processing, there is a growing interest in both batch processing, where large volumes of data are processed offline, and stream processing, where large quantities of strea...
详细信息
ISBN:
(纸本)9781509028252
Due to the rising demand for large-scale data processing, there is a growing interest in both batch processing, where large volumes of data are processed offline, and stream processing, where large quantities of streaming data are processed online. The dichotomy between these vastly different computing paradigms has led to the development of substantially different methodologies and systems. As there is an increasing number of applications requiring stream and batch processing, there is a need to bridge this gap and offer support for both paradigms. We introduce a new direction for the unification of stream and batch processing, which, contrary to other proposed approaches, uses a stream processing platform as its foundation and supports batch processing on top. Our proof-of-concept implementation of such a middleware layer, called Cyclone, offers the widely popular MapReduce programming model and translates MapReduce jobs for execution on the underlying streaming platform. Cyclone not only achieves a tight integration of batch and stream processing, our evaluation further shows significant performance gains, in particular for sequential and iterative jobs, which naturally arise in many applications.
Next-generation e-Science applications will require the ability to transfer information at high data rates between distributed computing centers and data repositories. To support such applications, lambda grid network...
详细信息
Various massively parallel computers have emerged in recent years. Each of then have some distinct architecture properties that challenge the computer scientist to develop algorithms and software appropriate to that s...
详细信息
ISBN:
(纸本)0818656026
Various massively parallel computers have emerged in recent years. Each of then have some distinct architecture properties that challenge the computer scientist to develop algorithms and software appropriate to that specified architecture. This is at least as difficult a problem as software reuse in the sequential computer case. One approach to addressing this problem is to design parallel computation models which abstract the architecture details into several generic parameters, which we call resource metrics. Typical resource metrics include the number of processors, communication latency, synchronization, bandwidth, block transfer capability, memory access method, and network topology hierarchy. We review the various parallel and distributed computation models and compare the different resource metrics chosen by different computation models.
State Machine Replication (SMR) is a fundamental fault tolerant technique for distributed systems. SMR traditionally requires sequential execution of commands at each replica node, so as to guarantee strong consistenc...
详细信息
ISBN:
(纸本)9781728109121
State Machine Replication (SMR) is a fundamental fault tolerant technique for distributed systems. SMR traditionally requires sequential execution of commands at each replica node, so as to guarantee strong consistency among replicas. To achieve high performance at large scale cloud datacenters, SMR has been parallelized by employing multiple threads at each replica. In this paper, we propose SMR-X, a novel parallel SMR scheme, which realizes flexible mapping of commands for parallel executing at each replica. The mapping between clients' requests and work threads is dynamically adjusted according to the load level of work threads. Therefore, workloads of different threads can he well balanced and high system throughput can be achieved. The major challenge in our work lies in the inconsistency problem caused by dynamic changes in request-thread mapping. To cope with this, we design delicate mechanisms to synchronize mapping function, so that strong consistency among replicas can he guaranteed. The correctness of the proposed scheme is rigorously proved and its performance is evaluated via simulations. Simulation results show that SMR-X can achieve better load balance and lower access latency than existing parallel SMR schemes.
The size of data sets being collected and analyzed in data science field is growing rapidly, making traditional big data processing solution prohibitively expensive. Especially when the data sets are too large, distri...
详细信息
ISBN:
(纸本)9781538626672
The size of data sets being collected and analyzed in data science field is growing rapidly, making traditional big data processing solution prohibitively expensive. Especially when the data sets are too large, distributed techniques are inevitable even for simple embarrassing parallel jobs. However, distributed computing is still inaccessible to a large number of users. For example, many average users are still struggling with complex cluster management and configuration tools[24] even just for summing up a group of numbers in a large data file. In this paper, we present BDViewer, a web-based big data processing and visualizing tool. BDViewer uses JavaScript plugins to enable users to view, process and visualize their large data files just through a web browser. By just clicking a button, users can open a large data file online and view the file contents immediately no matter how large the file is. In the back-end, BDViewer is built on a virtual private cloud system. Users' operations in a web browser are converted into map-reduce jobs and MPI tasks that are executed on the cloud. At the end of this paper, some experiments are carried out, which demonstrate BDViewer's effectiveness and ease of use.
Large-scale graph analytics has gained attention during the past few years. As the world is going to be more connected by appearance of new technologies and applications such as social networks, Web portals, mobile de...
详细信息
ISBN:
(纸本)9781509024537
Large-scale graph analytics has gained attention during the past few years. As the world is going to be more connected by appearance of new technologies and applications such as social networks, Web portals, mobile devices, Internet of things, etc, a huge amount of data are created and stored every day in the form of graphs consisting of billions of vertices and edges. Many graph processing frameworks have been developed to process these large graphs since Google introduced its graph processing framework called Pregel in 2010. On the other hand, cloud computing which is a new paradigm of computing that overcomes restrictions of traditional problems in computing by enabling some novel technological and economical solutions such as distributed computing, elasticity and pay-as-you-go models has improved service delivery features. In this paper, we present iGiraph, a cost-efficient Pregel-like graph processing framework for processing large-scale graphs on public clouds. iGiraph uses a new dynamic re-partitioning approach based on messaging pattern to minimize the cost of resource utilization on public clouds. We also present the experimental results on the performance and cost effects of our method and compare them with basic Giraph framework. Our results validate that iGiraph remarkably decreases the cost and improves the performance by scaling the number of workers dynamically.
The DARPA Communicator program has fuelled the design and development of impressive human language technology applications. Its distributed framework has offered numerous benefits to the research community, including ...
详细信息
ISBN:
(纸本)9780780397538
The DARPA Communicator program has fuelled the design and development of impressive human language technology applications. Its distributed framework has offered numerous benefits to the research community, including reduced prototype development time, sharing of components across sites, and provision of a standard evaluation platform. It has also enabled development of client-server applications with complex inter-process communication between modules. However, this latter feature, though beneficial, introduces complexities which reduce overall system robustness to failure. In addition, the ability to handle multiple users and multiple applications from a common interface is not innately supported In this paper, we describe our enhancements to the original Communicator architecture to address robustness issues and to support a multiple multi-user application capability. These enhancements have been evaluated using a series of experiments and they have shown a 7.2% improvement in the robustness of the system. These enhancements are available in our public domain toolkit.
暂无评论