this paper introduces invasive computing, a new paradigm for programming parallel architectures. the goals are to enable the development and execution of resource aware programs that can dynamically allocate and free ...
详细信息
ISBN:
(纸本)9780889868649
this paper introduces invasive computing, a new paradigm for programming parallel architectures. the goals are to enable the development and execution of resource aware programs that can dynamically allocate and free new resources in phases with more parallelism. To allocate more resources, applications use the invade operation and to free them the retreat. the research is conducted within the framework of the Transregional Collaborative Research Centre 89 funded by the German Science Foundation.
this paper argues that algorithmic skeletons are a suitable programming model for multi-core architectures. the high-level abstractions offered by algorithmic skeletons provide a simple way for non-parallel programmer...
详细信息
ISBN:
(纸本)9780769539393
this paper argues that algorithmic skeletons are a suitable programming model for multi-core architectures. the high-level abstractions offered by algorithmic skeletons provide a simple way for non-parallel programmers to address parallel programming. Previous algorithmic skeleton frameworks and libraries have addressed distributedcomputing environments such as Clusters and Grids. this paper proposes a parallel skeleton library, Skandium;and concludes, after an experimental evaluation, that algorithmic skeletons are an effective methodology to program multi-core architectures.
We present a novel dynamic on-the-fly race detection mechanism called parallel Nondeterminator to check for determinacy races during the parallel execution of a program with Spawn-Sync parallelism. the parallel Nondet...
详细信息
We present a novel dynamic on-the-fly race detection mechanism called parallel Nondeterminator to check for determinacy races during the parallel execution of a program with Spawn-Sync parallelism. the parallel Nondeterminator provides provable correctness and efficiency. Let D denote the maximum depth of the recursion in the parallel program. the worst case slowdown in execution incurred for each spawn operation is O(D), the overhead for each sync operation is O(1) and the time required to monitor any shared memory access is O(log D). Moreover, we have implemented the parallel Nondeterminator in Cilk, a parallel language developed at MIT. Boththeoretical and experimental results give strong evidences for the efficiency of our algorithm.
this paper presents a harmony search based parallel optimization algorithm to minimize voltage deviations in three phase unbalanced electrical distribution systems and to maximize active power outputs of distributed e...
详细信息
ISBN:
(纸本)9781509001910
this paper presents a harmony search based parallel optimization algorithm to minimize voltage deviations in three phase unbalanced electrical distribution systems and to maximize active power outputs of distributed energy resources (DR). the main contribution is to reduce the adverse impacts on voltage profile during a day as photovoltaics (PVs) output or electrical vehicles (EVs) charging changes throughout a day. the IEEE 123-bus distribution test system is modified by adding DRs and EVs under different load profiles. the simulation results show that by using parallelcomputing techniques, heuristic methods may be used as an alternative optimization tool in electrical power distribution systems operation.
the maximum flow problem is a combinatorial problem of significant importance in a wide variety of research and commercial applications. It has been extensively studied and implemented over the past 40 years. the push...
详细信息
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of s...
详细信息
ISBN:
(纸本)9780889867048
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Seaborg and Bassi at the DOE National Energy Research Scientific computing Center. In this paper, we quantify the performance gap resulting from using different number of processors per node for application execution (for which we use the term processor partitioning), and conduct detailed performance experiments to identify the major application characteristics that affect processor partitioning. We use the STREAM memory benchmarks and Intel's MPI benchmarks to explore the performance impact of different application characteristics. the results are then utilized to explain the performance results of processor partitioning using three NAS parallel Application benchmarks. the experimental results indicate that processor partitioning can have a significant impact on performance of a parallel scientific application as determined by its communication and memory requirements.
Analysis of neural signals like electroencephalogram (EEG) is one of the key technologies in detecting and diagnosing various brain disorders. As neural signals are non-stationary and non-linear in nature, it is almos...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
Analysis of neural signals like electroencephalogram (EEG) is one of the key technologies in detecting and diagnosing various brain disorders. As neural signals are non-stationary and non-linear in nature, it is almost impossible to understand their true physical dynamics until the recent advent of the Ensemble Empirical Mode Decomposition (EEMD) algorithm. the neural signal processing with EEMD is highly compute-intensive due to the high complexity of the EEMD algorithm. It is also data-intensive because 1) EEG signals contain massive data sets 2) EEMD has to introduce a large number of trials in processing to ensure precision. the MapReduce programming mode is a promising parallelcomputing paradigm for data intensive computing. To increase the efficiency and performance of the neural signal analysis, this research develops parallel EEMD neural signal processing with MapReduce. In this paper, we implement the parallel EEMD with Hadoop in a modern cyberinfrastructure. Test results and performance evaluation show that parallel EEMD can significantly improve the performance of neural signal processing.
In this paper, efficient and portable shared memory based parallel computation models for the string matching problem are presented and analyzed for their performances. For exploiting the parallelism in the computatio...
详细信息
In this paper, efficient and portable shared memory based parallel computation models for the string matching problem are presented and analyzed for their performances. For exploiting the parallelism in the computation models, parallel broadcasting method that is a dataflow scheme is applied. thus the models are time and space efficient since they are based on the dataflow mechanism. Several computation models are designed and tested for checking the aspects that affect the parallel programming performance such as granularity, communication, and I/O. For the implementation, Java threads that is a built-in support for the portable parallel programming in the shared memory environment is used. Experimental results demonstrate that the computation models are practical, portable, and scalable parallel solutions to the problem, and the comparative testing reveals facts between the theory and the practice.
the Voronoi diagram of a set of n sites on the plane has application in many diverse areas such as Geographical Information systems (GIS), Visualization, Robotics, Computer Graphics and Computer Aided Design (CAD). Un...
详细信息
the Voronoi diagram of a set of n sites on the plane has application in many diverse areas such as Geographical Information systems (GIS), Visualization, Robotics, Computer Graphics and Computer Aided Design (CAD). Unfortunately, its construction is very costly for large input sizes, and fast parallel algorithms for the problem are desired. there are no known parallel algorithms for the problem that are optimal with respect to the time-processor product. Best sequential algorithms work in O(n lg(n)) time. In this paper, an O(lg3n) time parallel algorithm for constructing the Voronoi Diagram of a set of n sites in the plane is presented on a hypercube with n processors. Our technique parallelizes the well-known seemingly inherent sequential technique of Shamos and Hoey, and makes use of a number of special properties of the dividing polygonal chain and the Batcher's bitonic sort.
暂无评论