Barrier construct in OpenMP program is a directive used to remove race code before continuing. Each thread waits until all of the other threads of a team have reached the barrier region. And data dependence is a techn...
详细信息
ISBN:
(纸本)9780769536422
Barrier construct in OpenMP program is a directive used to remove race code before continuing. Each thread waits until all of the other threads of a team have reached the barrier region. And data dependence is a technology to find whether two statements can be run in parallel. this paper prevents two ways to optimize the OpenMP programa barrier. the first one is to remove redundant barrier with data dependence information, that is, if all the statements across a barrier have no data dependence, then the barrier can be safe v removed. And the second one is to reduce the cost of barrier. the paper gives an implementation of another form of parallelism- DOACROSS and a new form of OpenMP barrier-region barrier, in which both are synchronized with busy-waiting. Experimental results show that the performance of the optimized OpenMP program is improved.
the performance of metadata processing in large distributed file systems currently presents larger challenges than scaling of data throughput. the paper presents a novel, distributed benchmark called DMetabench for me...
详细信息
ISBN:
(纸本)9783642032745
the performance of metadata processing in large distributed file systems currently presents larger challenges than scaling of data throughput. the paper presents a novel, distributed benchmark called DMetabench for measuring the performance of metadata operations (e.g. file creation). DMetabench runs in environments with potentially thousands of nodes and allows an assessment of the scalability of metadata operations. Additionally, precise run-tune performance data is preserved which allows for a better understanding of performance artifacts. Validation results from production file systems at the Leibniz Supercomputing Centre (LRZ) are provided and discussed. Possible applications of knowledge about metadata performance scaling include the choice of an optimal parallelization strategy for metadata-itensive workload in a specific runtime environment.
the paper investigates and compares skeleton-based Eden implementations of different FFT-algorithms on workstation clusters withdistributed memory. Our experiments show that the basic divide-and-conquer versions suff...
详细信息
ISBN:
(纸本)9783642032745
the paper investigates and compares skeleton-based Eden implementations of different FFT-algorithms on workstation clusters withdistributed memory. Our experiments show that the basic divide-and-conquer versions suffer from an inherent input distribution and result collection problem. Advanced approaches like calculating FFT using a parallel map-and-transpose skeleton provide more flexibility to overcome these problems. Assuming a distributed access to input data and re-organising computation to return results in a distributed way improves the parallel runtime behaviour.
In distributed object-oriented data bases(DOODB), objects are distributed in different sites on a communication network. A class fragmentation that divides a class into several fragments is needed for improving perfor...
详细信息
ISBN:
(纸本)9780769536422
In distributed object-oriented data bases(DOODB), objects are distributed in different sites on a communication network. A class fragmentation that divides a class into several fragments is needed for improving performance and for reducing repetition and duplication of data transmission. It is proposed the vertical class fragmentation to reflect the characteristics of object-oriented database such as method, inheritance and composite-object. In this paper, we define the objective function for allocation considering system to save cost including storage, query processing and communication and implemented it using Genetic Algorithm.
the results of theory and applications of optimal lattice cubature formulas are described. the approximate integration program based on lattice formulas is considered. It has sufficiently high precision for complicate...
详细信息
ISBN:
(纸本)9783642032745
the results of theory and applications of optimal lattice cubature formulas are described. the approximate integration program based on lattice formulas is considered. It has sufficiently high precision for complicated domains with smooth boundaries and dimensions tip to 10 and high efficiency of paralleling.
the implementations of parallel algorithms in solving partial differential equations (PDEs) for heat transfer problems are based on the high performance computing using distributed memory architecture. In this paper, ...
详细信息
ISBN:
(纸本)9783642032745
the implementations of parallel algorithms in solving partial differential equations (PDEs) for heat transfer problems are based on the high performance computing using distributed memory architecture. In this paper, the parallel algorithms are exploited finite difference method in solving multidimensional heat transfer problem for semiconductor components and polymer composite materials. parallel Virtual Machine (PVM) and G language based on Linux operating system are the platform to run the parallel algorithms. this research focused on Red-Black Gauss Seidel (RBGS) iterative method. parallel performance evaluations in terms of speedup, efficiency, effectiveness, temporal performance and communication cost are analyzed.
Nowadays simulation modeling is applied for solving a wide range of problems. there are simulations which require significant performance and time resources. To decrease overall simulation time a model can be converte...
详细信息
ISBN:
(纸本)9783642032745
Nowadays simulation modeling is applied for solving a wide range of problems. there are simulations which require significant performance and time resources. To decrease overall simulation time a model can be converted to a distributed system and executed on a computer network. the goal of this project is to create a library enabling clear and rapid development parallel discrete event models in AnyLogic. the library is aimed for professionals in computer simulation and helps to reduce code amount. the project includes a research on different synchronization algorithms. In this paper we present techniques which can be used in creating distributed models. We present comparison of a single threaded model with a distributed model implementing optimistic algorithm. the comparison shows a significant improvement in wallclock time achieved by separating the model into independent submodels with minimal communications.
We present a variety of possible parallelization approaches for a real-world case study using several modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algo...
详细信息
ISBN:
(纸本)9783642032745
We present a variety of possible parallelization approaches for a real-world case study using several modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography. We describe how this algorithm can be parallelized for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, distributed-memory clusters, graphics processors, the Cell processor and, finally, how various architectures can be accessed in a distributed Grid environment. the main contribution of the paper;besides the parallelization approaches, is their systematic comparison regarding four important criteria: performance, programming comfort, accessibility, and cost-effectiveness. We report results of experiments on particular parallel machines of different architectures that confirm the findings of our systematic comparison.
We consider the parallelcomputing environment where m organizations provide machines and several jobs to be executed. While cooperation of organizations is required to minimize the global makespan, each organization ...
详细信息
ISBN:
(纸本)9781424452910
We consider the parallelcomputing environment where m organizations provide machines and several jobs to be executed. While cooperation of organizations is required to minimize the global makespan, each organization also expects the faster completion of its own jobs primarily and thus it is not necessarily cooperative. To handle the situations, we formulate the alpha-cooperative multi-organization scheduling problem (alpha-MOSP), where alpha >= 1 is a parameter representing the degree of cooperativeness. alpha-MOSP minimizes the makespan under the cooperation constraint that each organization does not allow the completion time of its own jobs to be delayed alpha times of that in the case where those jobs are executed by itself. In this paper, we aim to reveal the relation between the makespan and the degree of cooperativeness. First, we investigate the relation between alpha and the quality of the global makespan. For alpha = 1 (i.e., each organization never sacrifices its completion time), we show an instance where the cooperation constraint degrades the optimal makespan by m times. In contrast, for alpha > 1, we can construct an algorithm transforming any unconstrained schedule to one satisfying the cooperation constraint. this algorithm bounds the degradation ratio by alpha/(alpha - 1), which implies that weak cooperation improves the makespan dramatically. Second, we study the complexity of alpha-MOSP. We show its strongly NP-hardness and inapproximability for the approximation factor less than max{(alpha 1)/alpha,3/2}. We also show the hardness of transformation: Even if an optimal schedule under no cooperation constraint is given, no polynomial-time algorithm finds an optimal schedule for a-MOSP. this result is a witness for inexistence of general polynomial-time transformation algorithms that preserve the approximation ratio.
Congestion control algorithms of existing reliable multicast protocols are mainly derived from end-to-end model, which has high resource requirements and sometimes suppresses the package sending too much. Many-to-many...
详细信息
ISBN:
(纸本)9780769536422
Congestion control algorithms of existing reliable multicast protocols are mainly derived from end-to-end model, which has high resource requirements and sometimes suppresses the package sending too much. Many-to-many reliable multicast requires efficient congestion control over a one-to-many model. It's an important mechanism to use many-to-many multicast in LAN (Local Area Network) in distributed simulation. In this paper, a congestion control algorithm based on loss trend for many-to-many reliable multicast is proposed. It predicts future package loss of receivers on the analysis of historic loss and buffer variety, and then control the congestion by adjusting the sending rate in advance. this algorithm aims at the reliable multicast in LAN. the main idea of the algorithm is to lower the possibility of multicast package loss, and then the nodes can afford the cost of package recovery. It alleviates the congestion on the depression of package loss possibility by regulating the sending rate. Experiment results indicate that the algorithm can keep a high throughput of many-to-many reliable multicast with relatively real-time performance.
暂无评论