Clusters of workstations have become a cost-effective means of performing scientific computations. However, large network latencies, resource sharing, and heterogeneity found in networks of clusters and Grids can impe...
详细信息
Clusters of workstations have become a cost-effective means of performing scientific computations. However, large network latencies, resource sharing, and heterogeneity found in networks of clusters and Grids can impede the performance of applications not specifically tailored for use in such environments. A typical example is the traditional fine grain implementations of Krylov-like iterative methods, a central component in many scientific applications. To exploit the potential of these environments, advances in networking technology must be complemented by advances in parallel algorithmic design. In this paper, we present an algorithmic technique that increases the granularity of parallel block iterative methods by inducing additional work during the preconditioning (inexact solution) phase of the iteration. During this phase, each vector in the block is preconditioned by a different subgroup of processors, yielding a much coarser granularity. the rest of the method comprises a small portion of the total time and is still implemented in fine grain. We call this combination of fine and coarse grain parallelism multigrain. We apply this idea to the block Jacobi-Davidson eigensolver, and present experimental data that shows the significant reduction of latency effects on networks of clusters of roughly equal capacity and size. We conclude with a discussion on how multigrain can be applied dynamically based on runtime network performance monitoring.
Despite the large I/O capabilities in modern cluster architectures with local disks on each node, applications mostly are not enabled to fully exploit them. this is especially problematic for data intensive applicatio...
详细信息
Despite the large I/O capabilities in modern cluster architectures with local disks on each node, applications mostly are not enabled to fully exploit them. this is especially problematic for data intensive applications which often suffer from low I/O performance. As one solution for this problem, a distribution I/O management (DIOM) system has been developed to manage a transparent distribution of data across cluster nodes and to then allow applications to access this data purely from local disks. In order to be effective, however, this distribution process requires semantic information about boththe application and the input data. this work therefore extends DIOM to include independent specifications for both data formats and application I/O patterns and thereby decouples them. this work is driven by an application from nuclear medical imaging, the reconstruction of PET images, for which DIOM has proven to be an adequate solution enabling truly scalable I/O and thereby improving the overall application performance.
We propose a hybrid parallelism-independent scheduling method, predominantly performed at compile time, which generates a machine code efficiently executable on any number of workstations or PCs in a cluster computing...
详细信息
We propose a hybrid parallelism-independent scheduling method, predominantly performed at compile time, which generates a machine code efficiently executable on any number of workstations or PCs in a cluster computing environment. Our scheduling algorithm called the dynamical level parallelism-independent scheduling algorithm (DLPIS) is applicable for distributed computer systems because additionally to the task scheduling, we perform message communication scheduling. It provides an explicit task synchronization mechanism guiding the task allocation and data dependency solution at run time at reduced overhead. Furthermore, we provide a mechanism allowing the self-adaptation of the machine code to the degree of parallelism of the system at run-time. therefore our scheduling method supports the variable number of processors in the users' computing systems and the adaptive parallelism, which may occur in distributed computing systems due to computer or link failure.
Despite the large I/O capabilities in modern cluster architectures with local disks on each node, applications mostly are not enabled to fully exploit them. this is especially problematic for data intensive applicatio...
详细信息
ISBN:
(纸本)9780769516868
Despite the large I/O capabilities in modern cluster architectures with local disks on each node, applications mostly are not enabled to fully exploit them. this is especially problematic for data intensive applications which often suffer from low I/O performance. As one solution for this problem, a Distribution I/O Management (DIOM) system has been developed to manage a transparent distribution of data across cluster nodes and to then allow applicationsto access this data purely from local disks. In order to be effective, however, this distribution process requires semantic information about boththe application and the input data. this work therefore extends DIOM to include independent specifications for both data formats and application I/O patterns and thereby decouples them. this work is driven by an application from nuclear medical imaging, the reconstruction of PET images, for which DIOM has provento be an adequate solution enabling truly scalable I/O and thereby improving the overall application performance.
In this paper, we develop a multithreaded algorithm for pricing simple options and implement it on a 8 node SMP machine using MIT's supercomputer programming language Cilk. the algorithm dynamically creates lots o...
详细信息
In this paper, we develop a multithreaded algorithm for pricing simple options and implement it on a 8 node SMP machine using MIT's supercomputer programming language Cilk. the algorithm dynamically creates lots of threads to exploit parallelism and relies on the Cilk runtime system to distribute the computation load. We present both analytical and experimental results and our results explain how Cilk could be used effectively to exploit parallelism in the given problem. the analytical results show that our algorithm has a very high average parallelism and hence Cilk is the target paradigm to implement the algorithm. We conclude from our implementation results that the size of the threads, the number of threads created, the load balancer the cost of spawning a thread are parameters that must be considered while designing the algorithm on the Cilk platform.
In this paper we focus on the temporary return of data values that are incorrect for given transactional semantics and could have catastrophic effects similar to those in parallel and discrete event simulation. In man...
详细信息
In this paper we focus on the temporary return of data values that are incorrect for given transactional semantics and could have catastrophic effects similar to those in parallel and discrete event simulation. In many applications using on-line transaction processing (OLTP) environments, for instance, it is best to delay the response to a transaction's read request until it is either known or unlikely that a write message from an older update transaction will not make the response incorrect. Examples of such applications are those where aberrant behavior is too costly, and those in which precommitted data are visible to some reactive entity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporally correct (RFMVTC) concurrency control algorithm. We discuss the algorithm, its implementation and report on the performance results of simulation models using a cluster of workstations.
distributed object-oriented computing allows efficient use of the Network Of Workstations (NOW) paradigm. However, the underlying middlewares used to develop and deploy such applications do not provide developers with...
详细信息
One of the primary tasks in mining distributed textual data is feature extraction. the widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a dist...
详细信息
ISBN:
(纸本)0769512968
One of the primary tasks in mining distributed textual data is feature extraction. the widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/work-stations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with HDDI(TM) [1] [18] [20]. We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.
the formalism of well-behaved timed languages was proposed in [7] as a tool for modeling practical real-time applications. We use this formalism for modeling aspects from the areas of real-time database systems and ad...
详细信息
the paper presents an approach of using algorithmic skeletons for adding data parallelism to an image processing library. the method is used for parallelizing image processingapplications composed of low-level image ...
详细信息
暂无评论