Multicore architectures are an important contribution in computing technology since they are capable of providing more processing power with better cost-benefit than single-core processors. Cores execute instructions ...
详细信息
Multicore architectures are an important contribution in computing technology since they are capable of providing more processing power with better cost-benefit than single-core processors. Cores execute instructions independently but share critical resources such as L2 cache memory and data channels. Clusters using multicore architectures or multiprocessors chips (MPC's) suggest a hierarchical memory environment. parallelapplications should take advantage of such memory hierarchy to achieve high performance. This paper presents a performance analysis of a synthetic application in a multicore cluster and introduces a preliminary architecture model that considers communication through both shared memory and data channels and its impact on the application performance.
The internationalparallel anddistributedprocessingsymposium (IPDPS) 2008 panel with the title "How to avoid making the same Mistakes all over again: What the parallel-processing Community has (failed) to offe...
详细信息
The internationalparallel anddistributedprocessingsymposium (IPDPS) 2008 panel with the title "How to avoid making the same Mistakes all over again: What the parallel-processing Community has (failed) to offer the multi/many-core Generation" sought to provoke discussion on current and recent computer science education in relation to the emergence of fundamentally parallel multi/many-core systems. Is today's/tomorrow's/yesterday's computer science graduate equipped to deal with the challenges of parallel software development for such systems? Are mistakes from the past being unnecessarily repeated? What are the fundamental contributions of the parallelprocessing research community to the current state of affairs that are possibly being ignored? What are the new challenges that have not been addressed in past parallelprocessing research? How should computer-science education in parallelprocessing look like? Should it be taught at all? To the extent that there was consensus among the panelists, they agreed on the premise for the panel, namely that there is a mismatch in computer-science education concerning parallelism, and that there may be reasons to be concerned. They agreed on stressing the importance of (a) applications as a driving factor in research and education, (b) parallel algorithms, and of (c) focusing on the ease of parallel programming and not exclusively on parallel performance, and cited for instance heterogeneous parallelism and power awareness as new issues for the multi-core generation. The panelists were Hideharu Amano (Keio University), John Gustafson (Clearspeed Technologies), Keshav Pingali (University of Austin, Texas), Vivek Sarkar (Rice University), Uzi Vishkin (University of Maryland), and Katherine Yelick (University of California at Berkeley). The panel was organized and moderated by the author. (c) 2009 Elsevier Inc. All rights reserved.
This paper presents many typical problems that are encountered when executing large scale scientific applications over distributed architectures. The causes and effects of these problems are explained and a solution f...
详细信息
ISBN:
(纸本)9781424437511
This paper presents many typical problems that are encountered when executing large scale scientific applications over distributed architectures. The causes and effects of these problems are explained and a solution for some classes of scientific applications is also proposed. This solution is the combination of the asynchronous iteration model with JACEP2P-V2 which is a filly decentralized and fault tolerant platform dedicated to executing parallel asynchronous applications over volatile distributed architectures. We explain in detail how our approach deals whit each of these problems. Then we present two large scale numerical experiments that prove the efficiency and the robustness of our approach.
Scheduling multiple applications on heterogeneous multi-clusters is challenging as the different applications have to compete for resources. A scheduler thus has to ensure a fair distribution of resources among the ap...
详细信息
ISBN:
(纸本)9781424437511
Scheduling multiple applications on heterogeneous multi-clusters is challenging as the different applications have to compete for resources. A scheduler thus has to ensure a fair distribution of resources among the applications and prevent harmful selfish behaviors while still trying to minimize their respective completion time. In this paper we consider mixed-parallelapplications, represented by graphs whose nodes are data-parallel tasks, that are scheduled in two steps: allocation and mapping. We investigate several strategies to constrain the amount of resources the scheduler can allocate to each application and evaluate them over a wide range of scenarios.
Genomic alignments, as a means to uncover evolutionary relationships among organisms, are a fundamental tool in computational biology. There is considerable recent interest in using the Cell Broadband Engine, a hetero...
详细信息
Genomic alignments, as a means to uncover evolutionary relationships among organisms, are a fundamental tool in computational biology. There is considerable recent interest in using the Cell Broadband Engine, a heterogeneous multicore chip that provides high performance, for biological applications. However, work in genomic alignments so far has been limited to computing optimal alignment scores using quadratic space for the basic global/local alignment problem. In this paper, we present a comprehensive study of developing alignment algorithms on the Cell, exploiting its thread and data level parallelism features. First, we develop a parallel implementation on the Cell that computes optimal alignments and adopts Hirschberg's linear space technique. The former is essential, as merely computing optimal alignment scores is not useful, while the latter is needed to permit alignments of longer sequences. We then present Cell implementations of two advanced alignment techniques-spliced alignments and syntenic alignments. Spliced alignments are useful in aligning mRNA sequences with corresponding genomic sequences to uncover the gene structure. Syntenic alignments are used to discover conserved exons and other sequences between long genomic sequences from different organisms. We present experimental results for these three types of alignments on 16 Synergistic processing Elements of the IBM QS20 dual-Cell blade system.
Automating the execution of applications in grid computing environments is a complicated task due to the heterogeneity of computing resources, resource usage policies, and application requirements. applications differ...
详细信息
ISBN:
(纸本)9781424437511
Automating the execution of applications in grid computing environments is a complicated task due to the heterogeneity of computing resources, resource usage policies, and application requirements. applications differ in memory, usage, performance, scalability and storage usage. Having knowledge of this information can aid in matching jobs to resources and in selecting appropriate configuration parameters such as the number of processors to run on and memory, requirements for those resources. This paper presents an application memory usage model that can be used to aid in selecting appropriate job configurations for different resources. The model can be used to represent how memory scales with the number of processors, the memory usage of different types of processes, and changes in memory, usage during execution. It builds on a previously, developed information model used for describing resources, resource usage policies and limited information on applications. An analysis of the memory, usage model illustrating its use towards automating job execution in grid computing environments is also presented.
Peer-to-peer (P2P) applications have recently attracted a large number of Internet users. Traditional P2P systems however suffer from inefficiency due to lack of information from the underlay, i.e. the physical networ...
详细信息
ISBN:
(纸本)9781424437511
Peer-to-peer (P2P) applications have recently attracted a large number of Internet users. Traditional P2P systems however suffer from inefficiency due to lack of information from the underlay, i.e. the physical network. Although there is a plethora of research on underlay awareness, this aspect of P2P systems is still not clearly structured. In this paper, we provide a taxonomic survey that outlines the different steps for achieving underlay awareness. The main contribution of this paper is presenting a clear picture of what underlay awareness is and how it can be used to build next generation P2P systems. Impacts of underlay awareness and open research issues are also discussed.
Recognition and mining (RM) applications are an emerging class of computing workloads that will be commonly executed on future multi-core and many-core computing platforms. The explosive growth of input data and the u...
详细信息
ISBN:
(纸本)9781424437511
Recognition and mining (RM) applications are an emerging class of computing workloads that will be commonly executed on future multi-core and many-core computing platforms. The explosive growth of input data and the use of more sophisticated algorithms in RM applications will ensure, for the foreseeable future, a significant gap between the computational needs of RM applications and the capabilities of rapidly evolving multi- or many-core platforms. To address this gap, we propose a new parallel programming model that inherently embodies the notion of best-effort computing, wherein the underlying parallel computing environment is not expected to be perfect. The proposed best-effort programming model leverages three key characteristics of RM applications: (1) the input data is noisy and it often contains significant redundancy, (2) computations performed on the input data are statistical in nature, and (3) some degree of imprecision in the output is acceptable. As a specific instance of the best-effort parallel programming model, we describe an "iterative-convergence" parallel template, which is used by a significant class of RM applications. We show how best-effort computing can be used to not only reduce computational workload, but to also eliminate dependencies between computations and further increase parallelism. Our experiments on an 8-core machine demonstrate a speed-up of 3.5X and 4.3X for the K-means and GLVQ algorithms, respectively, over a conventional parallel implementation. We also show that there is almost no material impact on the accuracy of results obtained from best-effort implementations in the application context of image segmentation using K-means and eye detection in images using GLVQ.
Multi-core architectures can deliver high processing power if the multiple levels of parallelism they expose are exploited. However, it is non-trivial to orchestrate the computational and memory resources allocation. ...
详细信息
Node churn can have a severe impact on the performance of P2P applications. In this paper, we consider the design of reliable P2P networks that can provide predictable performance. We exploit the experimental finding ...
详细信息
ISBN:
(纸本)9781424437511
Node churn can have a severe impact on the performance of P2P applications. In this paper, we consider the design of reliable P2P networks that can provide predictable performance. We exploit the experimental finding that the age of a node can be a reliable predictor of longer residual lifetime to develop mechanisms that organize the network around these more reliable nodes. We propose two protocols, TrebleCast and TrebleCast star, to implement reliable overlay networks. These protocols dynamically create reliable layers of peers by moving nodes with higher expected lifetime to the center of the overlay. These more reliable layers can then be called upon to deliver predictable performance in the presence of churn.
暂无评论