As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a ...
详细信息
ISBN:
(纸本)9783030185763;9783030185756
As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a characteristic-set-based partitioning scheme. This scheme (i) supports the fully parallelprocessing of join within characteristic sets;(ii) minimizes data communication by applying direct transmission of intermediate results instead of broadcasting. Then, Leon revisits the classical problem of multi-query optimization in the context of RDF/SPARQL. In light of the NP-hardness of the multi-query optimization for SPARQL, we propose a heuristic algorithm that partitions the input batch of queries into groups, and discover the common sub-query of multiple SPARQL queries. Our MQO algorithm incorporates with a subtle cost model to generate execution plans. Our experiments with synthetic and real datasets verify that: (i) Leon's startup overhead is low;(ii) Leon consistently outperforms centralized RDF engines by 1-2 orders of magnitude, and it is competitive with state-of-the-art distributed RDF engines;(iii) Our MQO approach consistently demonstrates 10x speedup over the baseline method.
The proposed reactive real-time programming system is a new approach to implement complex distributed heterogeneous real-time applications. It is based on the notion of distributed multi-agent systems. The whole contr...
详细信息
ISBN:
(纸本)9783540634409
The proposed reactive real-time programming system is a new approach to implement complex distributed heterogeneous real-time applications. It is based on the notion of distributed multi-agent systems. The whole control task is decomposed top down into small execution units, called agents which communicate by sending and executing contracts and are specified in a hardware independent language based on states and guarded commands. At compile time the agents are distributed to specified targets. PC's, micro-controllers, programmable logic controllers and even programmable logic devices are supported. The system automatically translates each agent to the particular code and realizes the communication including a bidding protocol between the agents either on the same processor or within a network. Due to a strictly cyclic processing of the agents exact response times can be guaranteed. Zero delay agents can be implemented in hardware.
The Sparse Matrix-Vector product (SpMV) is a key operation in engineering and scientific computing. Methods for efficiently implementing it in parallel are critical to the performance of many applications. Modern Grap...
详细信息
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but ...
详细信息
ISBN:
(纸本)9781538622933
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also due to the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with low impact on accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.
The join operation is the most costly operation in relational database management systems. distributed and parallelprocessing can effectively speed up the join operation. In this paper, we describe a number of highly...
详细信息
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using high performance compu...
详细信息
ISBN:
(纸本)9781479980062
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using high performance computing. However, the load imbalance of the workloads among the various processing nodes can cause severe degradation in application performance. This problem is exacerbated for the case when the computational workload is non-uniform and the processing nodes have varying computational capabilities. In this paper, we present novel local search algorithms for regular partitioning of a structured mesh to heterogeneous compute nodes in a distributed setting. The algorithms seek to assign larger workloads to processing nodes having higher computation capabilities while maintaining the regular structure of the mesh in order to achieve a better load balance. We also propose a distributed memory (MPI) parallelization architecture that can be used to achieve a parallel implementation of scientific modeling software requiring structured grids on heterogeneous processing resources involving CPUs and GPUs. Our implementation can make use of the available CPU cores and multiple GPUs of the underlying platform simultaneously. Empirical evaluation on real world flood modeling domains on a heterogeneous architecture comprising of multicore CPUs and GPUs suggests that the proposed partitioning approach can provide a performance improvement of up to 8x over a naive uniform partitioning.
The rapid growth in edge computing devices as part of Internet of Things (IoT) allows real-time access to time-series data from 1000's of sensors. Such observations are often queried to optimize the health of the ...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
The rapid growth in edge computing devices as part of Internet of Things (IoT) allows real-time access to time-series data from 1000's of sensors. Such observations are often queried to optimize the health of the infrastructure. Recently, edge storage systems allow us to retain data on the edge rather than moving them centrally to the cloud. However, such systems do not support flexible querying over the data spread across 10-100's of devices. There is also a lack of distributed time-series databases that can run on the edge devices. Here, we propose TorqueDB, a distributed query engine over time-series data that operates on edge and fog resources. TorqueDB leverages our prior work on ElfStore, a distributed edge-local file store, and InfluxDB, a time-series database, to enable temporal queries to be decomposed and executed across multiple fog and edge devices. Interestingly, we move data into InfluxDB on-demand while retaining the durable data within ElfStore for use by other applications. We also design a cost model that maximizes parallel movement and execution of the queries across resources, and utilizes caching. Our experiments on a real edge, fog and cloud deployment show that TorqueDB performs comparable to InfluxDB on a cloud VM for a smart city query workload, but without the associated monetary costs.
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high...
详细信息
ISBN:
(纸本)9780819492791
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e. g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
Programmable switches allow to offload specific processing tasks into the network and promise multi-Tbit /s throughput. One major goal when moving computation to the network is typically to reduce the volume of networ...
详细信息
ISBN:
(纸本)9781665420105
Programmable switches allow to offload specific processing tasks into the network and promise multi-Tbit /s throughput. One major goal when moving computation to the network is typically to reduce the volume of network traffic, and thus improve the overall performance. In this manner, programmable switches are increasingly used, both in research as well as in industry applications, for various scenarios, including statistics gathering, in-network consensus protocols, and more. However, the currently available programmable switches suffer from several practical limitations. One important restriction is the limited amount of available memory, making them unsuitable for stateful operations such as Hash Joins in distributed databases. In previous work, an FPGA-based In-Network Hash Join accelerator was presented, initially using DDR-DRAM to hold the state. In a later iteration, the hash table was moved to on-chip HBM-DRAM to improve the performance even further. However, while very fast, the size of the joins in this setup was limited by the relatively small amount of available HBM. In this work, we heterogeneously combine DDR-DRAM and HBM memories to support both larger joins and benefit from the far faster and more parallel HBM accesses. In this manner, we are able to improve the performance by a factor of 3x compared to the previous HBM-based work. We also introduce additional configuration parameters, supporting a more flexible adaptation of the underlying hardware architecture to the different join operations required by a concrete use-case.
This paper presents the results of parallel simulations of mixed networks using the ns-3 simulator. The simulated networks consist of equal number of simple (leaf) nodes and router nodes, which form a small-world netw...
详细信息
暂无评论