In this paper, we discuss parallelization of a high-level computer vision application in medical imaging, namely, multi-scale active shape description of MR (magnetic resonance) brain images of epileptic patients usin...
详细信息
ISBN:
(纸本)9780889866386
In this paper, we discuss parallelization of a high-level computer vision application in medical imaging, namely, multi-scale active shape description of MR (magnetic resonance) brain images of epileptic patients using active contour models, on a cluster of workstations. the paper gives a comparative study and analysis of three different approaches of parallel implementation using corresponding parallelcomputing patterns such as Temporal Multiplexing, Pipeline, and Composite Pipeline. the outcome of the cluster-based parallel implementations has shown encouraging results.
distributed testing is often hard to implement. this is due to difficulties in handling heterogeneous environments, complex configurations, synchronization, error probing, result maintenance and automation in distribu...
详细信息
ISBN:
(纸本)9780889866386
distributed testing is often hard to implement. this is due to difficulties in handling heterogeneous environments, complex configurations, synchronization, error probing, result maintenance and automation in distributed testing. this paper describes a practical testing framework that permits automated distributed testing of distributedsystems and applications. the framework extends the capability of JUnit to support test execution over heterogeneous environments and complex configurations.
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Alth...
详细信息
ISBN:
(纸本)9780889867048
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Although MegaScript programs require optimizations and extensions specific to the application and the computing environment, modifying the runtime system or task programs greatly reduces portability and reusability. To satisfy these conflicting requirements, we propose a user-level dynamic extension scheme named Adapter. In this scheme, the user defines a customization code and hooks to it a specific event. the runtime system calls back the code for the event locally, enabling it to extend or optimize system behavior without modifying the runtime or task programs. the results of our evaluation of the scheme show that the overhead and programming cost are both small enough for practical use.
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. the SL protocol shows performan...
详细信息
ISBN:
(纸本)9780889867048
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. the SL protocol shows performance improvements over the standard two-phase locking (2PL) protocol, but relies on several assumptions that would make it unsuitable in real-world scenarios. In this paper, we have proposed an adaptive speculative locking (ASL) protocol that improves performance of real-time distributed database systems by augmenting the SL protocol with four features: distributed real-time database system support;simultaneous multi-threading or page execution;control of transaction execution through transaction queue management;and restricting system memory through the use of virtual memory. the simulation results demonstrate the superiority of the ASL protocol over the SL protocols through the reduction of data contention caused by finite memory and the overall increase in transaction throughput.
Memory consistency model is crucial to the performance of shared-memory multiprocessors, and in current architectures several different models are adopted. In this paper, using graph algorithms for illustrative purpos...
详细信息
ISBN:
(纸本)9780889866386
Memory consistency model is crucial to the performance of shared-memory multiprocessors, and in current architectures several different models are adopted. In this paper, using graph algorithms for illustrative purposes, we consider the impact of memory model on the implementation and performance of parallel algorithms on shared-memory multiprocessors. We show that the implementation of PRAM algorithm's is largely "oblivious" of the underlying memory model, and has good performance on relaxed models. More importantly, we show that different memory models can favor drastically different algorithm designs.
Multi-cluster schedulers can dramatically improve average job turn-around time performance by making use of fragmented node resources available throughout the grid. By carefully mapping jobs across potentially many cl...
详细信息
ISBN:
(纸本)9780889867048
Multi-cluster schedulers can dramatically improve average job turn-around time performance by making use of fragmented node resources available throughout the grid. By carefully mapping jobs across potentially many clusters, jobs that would otherwise wait in the queue for local cluster resources can begin execution much earlier;thereby improving system utilization and reducing average queue waiting time. Recent research in this area leverages user-provided estimates of job communication characteristics to effectively partition the job across cluster boundaries. In this paper, we address the impact of inaccuracies in these estimates on overall system performance. Furthermore, we demonstrate that multi-site job scheduling techniques benefit from these estimates, even in the presence of considerable inaccuracy.
parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of...
详细信息
ISBN:
(纸本)9780889867048
parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of discovering performance inefficiencies in parallel applications. However, as we show in this paper, the results produced by these tool can be potentially misleading and sometimes, outright incorrect. the reason is that the overhead due to performance inefficiencies originating at a certain point in the program can causally propagate and manifest itself at other points. Current techniques perform a flat analysis, i.e., they do not account for causal propagation. In this paper, we present a method of causal analysis that current analysis techniques can be retrofitted with to account for causal propagation of overhead to arrive at a more accurate description of performance bottlenecks. We also show various advantages rendered by this technique to improving the effectiveness of automatic performance analysis. In this paper, we only tackle overhead related to communication operations in MPI parallel application. In general, however, our technique can be used for non-communication related overhead for any parallel programming paradigm.
this paper deals with a novel communication timing control for wireless networks and radio interference problem. Communication timing control is based on the mutual synchronization of coupled phase oscillatory dynamic...
详细信息
ISBN:
(纸本)9780889866386
this paper deals with a novel communication timing control for wireless networks and radio interference problem. Communication timing control is based on the mutual synchronization of coupled phase oscillatory dynamics with a stochastic adaptation. through local and fully distributed interactions, the coupled phase dynamics self-organizes collision free communication timing. In a wireless communication, the influence of the interference wave causes unexpected collision. therefore, we propose a more effective timing control by selecting the interaction nodes according to received signal strength.
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of s...
详细信息
ISBN:
(纸本)9780889867048
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Seaborg and Bassi at the DOE National Energy Research Scientific computing Center. In this paper, we quantify the performance gap resulting from using different number of processors per node for application execution (for which we use the term processor partitioning), and conduct detailed performance experiments to identify the major application characteristics that affect processor partitioning. We use the STREAM memory benchmarks and Intel's MPI benchmarks to explore the performance impact of different application characteristics. the results are then utilized to explain the performance results of processor partitioning using three NAS parallel Application benchmarks. the experimental results indicate that processor partitioning can have a significant impact on performance of a parallel scientific application as determined by its communication and memory requirements.
this paper studies Strassen's matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the well-known scientif...
详细信息
ISBN:
(纸本)9780889866386
this paper studies Strassen's matrix multiplication algorithm by implementing it in a variety of methods: sequential, workflow, and in parallel. All the methods show better performance than the well-known scientific libraries for medium to large size matrices. the sequential recursive program is implemented and compared with ATLAS's DGEMM subroutine. A workflow program in the NetSolve system and two parallel programs based on MPI and Scal-APACK are also implemented. By analyzing the time complexity and memory requirement of each method, we provide insight into how to utilize Strassen's Algorithm to speedup matrix multiplication based on existing high performance tools or libraries.
暂无评论