Over the past 15 years, data warehousing and OLAP technologies have matured to the point whereby they have become a cornerstone for the decision making process in organizations of all sizes. Withthe underlying databa...
详细信息
ISBN:
(纸本)9780769546759
Over the past 15 years, data warehousing and OLAP technologies have matured to the point whereby they have become a cornerstone for the decision making process in organizations of all sizes. Withthe underlying databases growing enormously in size, parallel DBM systems have become a popular target platform. Perhaps the most "obvious" approach to scalable warehousing is to combine a small collection of conventional relational DBMSs into a loosely connected parallel DBMS. Such systems, however, benefit little, if at all, from advances in OLAP indexing, storage, compression, modeling, or query optimization. In the current paper, we discuss a parallel analytics server that has been designed from the ground up as a high performance OLAP query engine. Moreover, its indexing and query processing model directly exploits an OLAP-specific algebra that enables performance optimizations beyond the reach of simple relational DBMS clusters. Taken together, the server provides class-leading query performance withthe scalability of shared nothing databases and, perhaps most importantly, achieves this balance with a modest physical architecture.
In this paper we present a Helper threading scheme used to parallelize efficiently Kruskal's Minimum Spanning Forest algorithm. this algorithm is known for exhibiting inherently sequential characteristics. More sp...
详细信息
ISBN:
(纸本)9780769546766
In this paper we present a Helper threading scheme used to parallelize efficiently Kruskal's Minimum Spanning Forest algorithm. this algorithm is known for exhibiting inherently sequential characteristics. More specifically, the strict order by which the algorithm checks the edges of a given graph is the main reason behind the lack of explicit parallelism. Our proposed scheme attempts to overcome the imposed restrictions and improve the performance of the algorithm. the results show that for a wide range of graphs of varying structure, size and density the parallelization of Kruskal's algorithm is feasible. Observed speedups reach up to 5.5 for 8 running threads, revealing the potentials of our approach.
A significant disadvantage of fully homomorphic encryption is the long periods of time needed to process encrypted data, due to its complex and CPU-intensive arithmetic techniques. In this paper, a communication proto...
详细信息
ISBN:
(纸本)9781538637906
A significant disadvantage of fully homomorphic encryption is the long periods of time needed to process encrypted data, due to its complex and CPU-intensive arithmetic techniques. In this paper, a communication protocol is developed to ensure authenticity, integrity and privacy of measurement data across a distributed measuring system. the fully homomorphic encryption library LibScarab was extended by integer arithmetics, comparisons, decisions and multithreading to secure data processing. Furthermore, it enhances 32 and 64-bit arithmetic operations, improving them by a higher factor. this extension is integrated into a cloud computing architecture establishing privacy by design. the resulting parallelized algorithm solved the time constraint issues for smart meter gateway tariffs. Several tests were performed, fulfilling tariff specifications, where preserving privacy of accumulated data was necessary. It was concluded that this extension of the fully homomorphic encryption library meets the requirements of real world applications.
In this paper, we propose a new class of interconnection networks called recursive hierarchical swapped networks (RHSN) for general-purpose parallelprocessing. the node degrees of RHSNs can vary from a small number t...
详细信息
In this paper, we propose a new class of interconnection networks called recursive hierarchical swapped networks (RHSN) for general-purpose parallelprocessing. the node degrees of RHSNs can vary from a small number to as large as required, depending on recursive and hierarchical composition parameters and the nucleus graph chosen. the diameter of an RHSN can be asymptotically optimal within a small constant factor. We present efficient routing, semigroup computation, ascend/descend, matrix-matrix multiplication, and emulation algorithms, thus proving the versatility of RHSNs. In particular, on suitably constructed RHSNs, matrix multiplication can be performed faster than the DNS algorithm on a hypercube. Furthermore, ascend/descend algorithms, semigroup computation, and parallel prefix computation can be done using algorithms with asymptotically fewer communication steps than on a hypercube.
In recent years, chaos-based image cipher has been widely studied and a growing number of schemes based on permutation-substitution architecture have been proposed. To better meet the challenge of real-time secure ima...
详细信息
ISBN:
(纸本)9781538637906
In recent years, chaos-based image cipher has been widely studied and a growing number of schemes based on permutation-substitution architecture have been proposed. To better meet the challenge of real-time secure image communication applications, this paper suggests a new image encryption scheme using a parallel substitution. In the permutation stage, the Arnold cat map is employed to shuffle the pixel positions so as to erase the strong relationship between adjacent pixels. In the substitution stage, the scrambled image is firstly decomposed into eight bit-planes which are then parallelly mixed with keystreams generated by the chaotic logistic map. theoretically, the parallel substitution strategy runs eight times faster than the serial strategy on an 8-thread processor as the volume of data processed by each substitution unit is 1/8 of that of the input image. Experimental results show that the proposed parallel scheme runs more than five times faster than the serial scheme. Extensive security analysis is carried out with detailed analysis, demonstrating the satisfactory security of the proposed scheme.
Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performanc...
详细信息
ISBN:
(纸本)9781509036820
Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, withthe mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenACC in all cases. BUDE, a compute bound code, required 2.2x the runtime of an equivalently optimised CUDA code, which we believe is caused by an inflated frequency of control flow operations and less efficient arithmetic optimisation. Impressively, both TeaLeaf and CloverLeaf, memory bandwidth bound codes, only required 1.3x the runtime of hand-optimised CUDA implementations. Overall, we find that OpenMP 4.0 is a highly usable open standard capable of performant heterogeneous execution, making it a promising option for scientific application developers.
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields ...
详细信息
ISBN:
(纸本)0818656026
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields of general relativity, molecular dynamics and QCD. the Paramid is a distributed memory parallel computer, using up to 64 Intel i860-XP processors. the results demonstrate good parallel performance, and the ability of the machine to run standard portable software.
One of the fundamental goals of parallel computing is to develop a framework that will support portable and efficient application programs. the Bulk-Synchronous parallel (BSP) model was proposed to help achieve this g...
详细信息
One of the fundamental goals of parallel computing is to develop a framework that will support portable and efficient application programs. the Bulk-Synchronous parallel (BSP) model was proposed to help achieve this goal. the BSP model is intended to be a `unifying model' - it addresses both software and hardware issues by allowing theoretical analysis to coexist with practical physical implementations. For several years the BSP model has been supported mainly by theoretical results. Recent experiments, however, have begun to demonstrate the practicality of the model for real architectures running real applications. the goal of this paper is to describe the methodology used to construct an efficient BSP library on the BBN Butterfly GP1000. Our results are relevant for BSP library implementations on shared-memory systems in general and for NUMA (nonuniform-memory-access) machines in particular.
the application of parallel and distributed systems to the multi-agent environments has attracted recent attention. Multi-agent systems are a particular type of distributed artificial intelligence system. this paper p...
详细信息
ISBN:
(纸本)0769511538
the application of parallel and distributed systems to the multi-agent environments has attracted recent attention. Multi-agent systems are a particular type of distributed artificial intelligence system. this paper presents an approach to learning in parallel and distributed systems. A variant of the job assignment problem is chosen as on evaluation task. this is an NP-hard problem, which is relevant to many industrial application domains. Experimental results show the effectiveness of the proposed approach.
the computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. this causes workload imbalance among processors on a parallel machine which, in turn, requires signific...
详细信息
the computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. this causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
暂无评论