Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism yields diminishing returns or even deg...
详细信息
Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism yields diminishing returns or even degrades performance. When increasing concurrency is futile, we propose an additional mode of execution, called slipstream mode, that instead enlists extra processors to assist parallel tasks by reducing perceived overheads. We consider DSM multiprocessors built from dual-processor chip multiprocessor (CMP) nodes with shared L2 cache. A task is allocated on one processor of each CMP node. The other processor of each node executes a reduced version of the same task. The reduced version skips shared-memory stores and synchronization, running ahead of the true task. Even with the skipped operations, the reduced task makes accurate forward progress and generates an accurate reference stream, because branches and addresses depend primarily on private data. Slipstream execution mode yields two benefits. First, the reduced task prefetches data on behalf of the true task. second, reduced tasks provide a detailed picture of future reference behavior, enabling a number of optimizations aimed at accelerating coherence events, e.g., self-invalidation. For multiprocessor systems with up to 16 CMP nodes, slipstream mode outperforms running one or two conventional tasks per CMP in 7 out of 9 parallel scientific benchmarks. Slipstream mode is 12-19% faster with prefetching only and up to 29% faster with self-invalidation enabled.
distributeddatabasessystems need commit processing so that transactions executing on them still preserve the ACID property. With the advance of main memory database systems which become possible due to dropping pric...
The dQUOB system conceptualization of datastreams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not...
详细信息
ISBN:
(纸本)0769516866
The dQUOB system conceptualization of datastreams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not thousands of data streams. Experience with dQUOB has shown the need for more aggressive memory management to achieve the scalability we desire. This paper addresses the problem with a two-fold solution. The first is replacement of the existing First Come First Served (FCFS) scheduling algorithm with an Earliest Job First (EJF) algorithm which we demonstrate to yield better average service time. The second is an introspection algorithm that sets and adapts the sizes of join windows in response to knowledge acquired at runtime about event rates. In addition to the potential for significant improvements in memory utilization, the algorithm presented here also provides a means by which the user can reason about join window sizes. Wide area measurements demonstrate the adaptive capability required by the introspection technique.
This paper describes our distributed architectural simulator of shared memory multiprocessors named Shaman. The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction leve...
详细信息
ISBN:
(纸本)0769518400
This paper describes our distributed architectural simulator of shared memory multiprocessors named Shaman. The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction level behavior of a target multiprocessor in parallel and a back-end node to simulate the target memory system. The front-end also simulates the logical behavior of the shared memory using software DSM technique and generates memory references to drive the back-end. A remarkable feature of our simulator is the reference filtering to reduce the amount of the references transferred from the front-end to the back-end utilizing the DSM mechanism and coherent cache simulation on the front-end. This technique and our sophisticated DSM implementation discussed in this paper give an extraordinary performance to the Shaman simulator We achieved 335 million and 392 million simulation clock per second for LU decomposition and FFT in SPLASH-2 kernel benchmarks respectively, when we used 16 front-end nodes to simulate a 16-way target SMP
This paper presents a hierarchical parallel MPEG-2 decoder for playing ultra-high-resolution videos on PC cluster based tiled display systems. To maximize parallelism while minimizing the communication requirements fo...
The following topics are dealt with: signal processing and image processing; network interfaces; scheduling; financial applications, data mining, databases and logic programming; compilation; distributedsystems; perf...
The following topics are dealt with: signal processing and image processing; network interfaces; scheduling; financial applications, data mining, databases and logic programming; compilation; distributedsystems; performance and benchmarks; distributedsystems and middleware; routing; numerical algorithms and applications; communication protocols; scheduling and load balancing; industrial applications; tools and run-time support; computer architecture; scheduling and task allocation; numerical and out-of-core algorithms; algorithms and theory; and task allocation and synchronization.
distributeddatabasessystems need commit processing so that transactions executing on them still preserve the ACID property. With the advance of main memory database systems which become possible due to dropping pric...
详细信息
distributeddatabasessystems need commit processing so that transactions executing on them still preserve the ACID property. With the advance of main memory database systems which become possible due to dropping price and increasing capacity of the RAM and CPU, the database processing speed has been increased in one order of magnitude. However, when it comes to distributed commit processing, it is still very slow since disk logging has to precede the transaction commit where the database access does not incur any disk access at all in the case of main memory databases. In this paper, we re-evaluate the various distributed commit protocols and come up with a single phase distributed commit protocol suitable for the distributed main memory database systems. Our simulation study confirms that the new protocol greatly reduces the time it takes to commit distributed transactions without any consistency problem.
暂无评论