Efficient communication on fetching remote data is a critical parameter in distributed shared-memory multiprocessors (DSM) in order to achieve high performance. Message-passing techniques are used in many modern commu...
详细信息
ISBN:
(纸本)0769509363
Efficient communication on fetching remote data is a critical parameter in distributed shared-memory multiprocessors (DSM) in order to achieve high performance. Message-passing techniques are used in many modern communication systems and routers are essential building blocks for these communication systems. Hence, in this paper emphasis is placed on the design of routers for 2-ary n-cube networks. Based on a simple deadlock free algorithm, we analyze the influence of the router structure. To be more precise, the parameters considered were the clock frequency and the number of pipeline stages of the router. the performance evaluation for DSM applications shows there are significant gains in using segmented routers designs. In our evaluations, results show an improvement of up to 12% in the execution time of some applications. this improvement occurs even though the base latency of the router has increased by 40%.
In this paper, we study the problem of positioning copies of shared data structures to reduce power consumption in real-time systems. Power-constrained real-time systems are of increasing importance in defense, space,...
详细信息
Following the long tradition of this well established event, the 5thinternational Workshop on High-Level parallel Programming Modelsand Supportive Environments (HIPS 2000) provides a forum for researchers and develop...
ISBN:
(纸本)354067442X
Following the long tradition of this well established event, the 5thinternational Workshop on High-Level parallel Programming Modelsand Supportive Environments (HIPS 2000) provides a forum for researchers and developers from both academia and industry to meet and discuss the newest approaches and results in this active research area. It is again held in conjunction with IPDPS (formerly known as IPPS/SPDP), one of the premier events in the area of parallel and distributedprocessing.
this paper presents the design and evaluation of high-radix parallel dividers for high-speed signal and data processingapplications, the presented divider designs are based on the unified high-radix division algorith...
详细信息
ISBN:
(纸本)0769506933
this paper presents the design and evaluation of high-radix parallel dividers for high-speed signal and data processingapplications, the presented divider designs are based on the unified high-radix division algorithm proposed by the authors. By prescaling the operands and converting the representation of each partial remainder into partially non-redundant representation, the quotient digit can be obtained directly from the integer part of the: partial remainder without using quotient digit selection tables. Performance evaluation shows that the proposed radix-4 and radix-8 divider architectures achieve faster computation with less hardware complexity, in comparison withthe binary counterparts. this paper also presents the experimental fabrication of the radix-4 divider in 0.35 mu m CMOS technology.
HeteroSort load balances and sorts within static or dynamic networks using a conceptual torus mesh. We ported HeteroSort to a 16-node Beowulf cluster with a central switch architecture. By capturing global system know...
详细信息
Computer simulations of realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. the process of obtaining numerical solutions to the governing PDEs...
详细信息
ISBN:
(纸本)354067442X
Computer simulations of realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. the process of obtaining numerical solutions to the governing PDEs involves solving large sparse linear or eigen systems over the unstructured meshes that model the underlying physical objects. these systems are often solved iteratively, where the sparse matrix-vector multiply (SPMV) is the most expensive operation within each iteration. In this paper, we focus on the efficiency of SPMV using various ordering/partitioning algorithms. We examine different implementations using three leading programming paradigms and architectures. Results show that ordering greatly improves performance, and that cache reuse can be more important than reducing communication. However, a multithreaded implementation indicates that ordering and partitioning are not required on the Tera MTA to obtain an efficient and scalable SPMV.
In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission m...
详细信息
ISBN:
(纸本)354067442X
In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot provide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achieved for parallel data mining on 100 PCs. Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is experted to be considerably better than using hard disks as a swapping device.
We describe MW - a software framework that allows users to quickly and easily parallelize scientific computations using the master-worker paradigm on the computational grid. MW provides both a 'top level' inte...
详细信息
We describe MW - a software framework that allows users to quickly and easily parallelize scientific computations using the master-worker paradigm on the computational grid. MW provides both a 'top level' interface to application software and a 'bottom level' interface to existing grid computing toolkits. Both interfaces are briefly described. We conclude with a case study, where the necessary Grid services are provided by the Condor high-throughput computing system, and the MW-enabled application code is used to solve a combinatorial optimization problem of unprecedented complexity.
the design of application for Computational Grids relies partly on communication paradigms. In most of the Grid experiments, message-passing has been the main paradigm either to let several processes from a single par...
详细信息
ISBN:
(纸本)3540679561
the design of application for Computational Grids relies partly on communication paradigms. In most of the Grid experiments, message-passing has been the main paradigm either to let several processes from a single parallel application to exchange data or to allow several applications to communicate between each others. In this article, we advocate the use of a modern approach for programming a Grid. It is based on the use of distributed objects, namely parallel CORBA objects. We focus our attention on the handling of distributed data within parallel CORBA objects. We show some performance results that were obtained using a NEC Cenju-4 parallel machine connected to a PC cluster.
Dynamic, distributed, real-time systems control an environment that varies widely without any time-invariant statistical or deterministic characteristic, are spread across multiple loosely-coupled computers, and must ...
详细信息
暂无评论