The #SAT problem, that is counting the number of solutions of a propositional formula, extends the well-known SAT problem into the realm of probabilistic reasoning. However, the higher computational complexity and lac...
详细信息
ISBN:
(纸本)9781509036547
The #SAT problem, that is counting the number of solutions of a propositional formula, extends the well-known SAT problem into the realm of probabilistic reasoning. However, the higher computational complexity and lack of fast solvers still limits its applicability for real world problems. In this work we present our distributedparallel #SAT solver dCountAntom which utilizes both local, shared-memory parallelism as well as distributed (cluster computing) parallelism. Although highly parallel solvers are known in SAT solving, such techniques have never been applied to the #SAT problem. Furthermore we introduce a solve progress indicator which helps the user to assess whether the presented problem is likely solvable within a reasonable time. Our analysis shows a high accuracy of the estimated progress. Our experiments with up to 256 CPU cores working in parallel yield large speedups across different benchmarks derived from real world problems: With the maximum number of available cores dCountAntom solved problems on average 141 times faster than a single core implementation.
Data outsourcing allows data owners to keep their data in public clouds, which do not ensure the privacy of data and computations. One fundamental and useful framework for processing data in a distributed fashion is M...
详细信息
The aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. The great challenge is to maintain the scalability and efficiency of massivel...
详细信息
The aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. The great challenge is to maintain the scalability and efficiency of massively parallel and distributed computational system when the intensive big data processed by its applications is widely increased. Besides, the proposed middleware implements a new cooperative micro-services team works model for massively parallel and distributed computing. This model is constituted by distributed micro-services as Micro-service Virtual processing Units (MsVPUs) with integrated load balancing service and an AMQP communication protocol that grant HPC. The paper shows the proposed distributed computational scheme and its integrated middleware accompanying by some experimental results.
A parallel module for applications based on overlapping grids has been devised and implemented in JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). In this module, a patch-based data structure,...
详细信息
A parallel module for applications based on overlapping grids has been devised and implemented in JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). In this module, a patch-based data structure, a grid mapping method and a unified communication schedule have been designed and adopted to overcome the communication bottleneck broadly existing in overlapping grids parallel computing. A grid mapping method library has been designed to make the module be adaptive to all kinds of structured grids, and an interpolator library has also been designed to gather interpolators. Meanwhile, by encapsulating parallel computing strategies, such as distributed storage, data communications, etc. and providing standard interfaces, this module can help users realize overlapping grids parallel computing conveniently. According to our test results, applications based on this module can be run efficiently on thousands of processors, which prove the module's satisfying parallel performance.
For optical remote sensing images, an effective method to reduce or eliminate the impact of clouds is important. With big data input and real-time processing demands, efficient parallelization strategies are essential...
详细信息
For optical remote sensing images, an effective method to reduce or eliminate the impact of clouds is important. With big data input and real-time processing demands, efficient parallelization strategies are essential for high performance computing on multi-core systems. This paper proposes an efficient high performance parallel computing framework for cloud filtering and smoothing. A comparison and benchmarking of two parallel algorithms for cloud filtering that incorporates spatial smoothing solved by two-dimensional dynamic programming is implemented. The experiments were carried out on an NVIDIA GPU accelerator with evaluations of approximation, parallelism and performance. The test results show significant performance improvements with high accuracy compared with sequential CPU implementation, and can be applied to other multi-core systems.
applications typically exhibit extremely different performance characteristics depending on the accelerator. Back propagation neural network (BPNN) has been parallelized into different platforms. However, it has not y...
详细信息
ISBN:
(纸本)9781509053827
applications typically exhibit extremely different performance characteristics depending on the accelerator. Back propagation neural network (BPNN) has been parallelized into different platforms. However, it has not yet been explored on speculative multicore architecture thoroughly. This paper presents a study of parallelizing BPNN on a speculative multicore architecture, including its speculative execution model, hardware design and programming model. The implementation was analyzed with seven well-known benchmark data sets. Furthermore, it trades off several important design factors in coming speculative multicore architecture. The experimental results show that: (1) the BPNN performs well on speculative multicore platform. It can achieve similar speedup (17.7x to 57.4x) compared with graphics processors (GPU) while provides a more friendly programmability. (2) 64 cores' computing resources can be used efficiently and 4k is the proper speculative buffer capacity in the model.
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. Checkpointing is one of the most popular fault tolerance techni...
详细信息
ISBN:
(纸本)9781479984909
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. Checkpointing is one of the most popular fault tolerance techniques. However, checkpointing cost in terms of computing time, network utilization or storage resources can be a limitation for its practical use. This work proposes different techniques for the optimization of the I/O cost in the checkpointing of shared-memory parallelapplications. The proposals are extensively evaluated using the OpenMP NAS parallel Benchmarks. Results show a significant decrease of the checkpointing overhead.
Graphs are increasingly being used as the data structure of choice to represent interactions between heterogeneous entities. Graph path querying is a primary operation in the network graph space, for both real time qu...
详细信息
Graphs are increasingly being used as the data structure of choice to represent interactions between heterogeneous entities. Graph path querying is a primary operation in the network graph space, for both real time querying and inferential analysis. The rate and volume of interconnected data being generated warrants efficient distributed solutions to manage and query network graphs in a scalable fashion. Existing distributed solutions have proposed several optimization techniques, including intelligent joins and partial evaluations to process path queries. However, the former relies on comprehensive indices while the latter involves extensive driver-side processing to combine the partial results, neither of which is efficient for processing large graphs. In this paper, we propose a novel distributed graph path query processing system using the Apache Spark framework.
The problem of deepening memory hierarchy towards exascale is becoming serious for applications such as those based on stencil kernels, as it is difficult to satisfy both high memory bandwidth and capacity requirement...
详细信息
ISBN:
(纸本)9780769557854
The problem of deepening memory hierarchy towards exascale is becoming serious for applications such as those based on stencil kernels, as it is difficult to satisfy both high memory bandwidth and capacity requirements simultaneously. This is evident even today, where problem sizes of stencil-based applications on GPU supercomputers are limited by aggregated capacity of GPU device memory. There have been locality improvement techniques such as temporal blocking to enhance performance, but integrating those techniques into existing stencil applications results requires substantially higher programming cost, especially for complex applications and as a result are not typically utilized. We alleviate this problem with a run-time GPU-MPI process oversubscription library we call HHRT that automates data movement across the memory hierarchy, and a systematic methodology to convert, and optimize the code to accommodate temporal blocking. The proposed methodology has shown to significantly eases the adaptation of real applications, such as the whole-city airflow simulator embodying more than 12,000 lines of code;with careful tuning, we successfully maintain up to 85% performance even with problems whose footprint is four time larger than GPU device memory capacity, and scale to hundreds of GPUs on the TSUBAME2.5 supercomputer.
Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing n...
详细信息
ISBN:
(纸本)9781467398046
Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing nodes are connected with network, the latency and bandwidth of network perform much worse than Hyper Transport (HT) and PCI-Express (PCI-E) bus. In order to enhance the performance of applications, a Hybrid parallel Framework for Computation-intensive applications (HPFCA) was proposed. Task distribution, data storage, multicore parallelism and kernel optimization were discussed in the HPFCA. "MPI+OpenMP/Pthreads" mechanism was used for multi-node platforms. MPI was used for distributed memory parallelism, and "OpenMP/Pthreads" was used for shared memory parallelism. Moreover, GEMM and FFT, the representatives of the computation-intensive applications in the Godson-3B, were studied. According to the HPFCA, the parallel algorithms of GEMM and FFT were optimized. Finally, experimental results demonstrated that HPFCA could bring ideal performance in the Godson-3B.
暂无评论