This paper describes a novel approach to parallel simulation of complex multi-agent systems which is based on actors and the Java middleware Terracotta. The approach aims to an exploitation of the computing power of m...
详细信息
ISBN:
(纸本)9780769542515
This paper describes a novel approach to parallel simulation of complex multi-agent systems which is based on actors and the Java middleware Terracotta. The approach aims to an exploitation of the computing power of modern multi-core machines. Terracotta was chosen because it transparently allows to cluster the JVM. The paper discusses design and implementation aspects of the approach, and demonstrates the achievable execution performance through the parallel simulation of a scalable multi-agent system based on the predator/prey model.
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and ...
详细信息
ISBN:
(纸本)9781538643686
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and hence can be considered as a key building block of sparse numerical linear algebra. This is why, since the early days, their parallel solution has been exhaustively studied, and efficient implementations of this kernel can be found for almost every hardware platform. In the GPU context, the most widespread implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to aggregate the unknowns of the triangular system into level sets. This determines an execution schedule for the solution of the system, where the level sets have to be processed sequentially while the unknowns that belong to one level set can be solved in parallel. One of the disadvantages of the CUSPARSE implementation is that this preprocessing stage is often extremely slow in comparison to the runtime of the solving phase. In this work, we present a parallel GPU algorithm that is able to compute the same level sets as CUSPARSE but takes significantly less runtime. Our experiments on a set of matrices from the SuiteSparse collection show acceleration factors of up to 44x. Additionally, we provide a routine capable of solving a triangular linear system on the same pass used to calculate the level sets, yielding important performance benefits.
Replication is a common method for increasing the availability of data in a distributed environment. Our interest is in the application of replication techniques in the domain of parallel processing. This paper explor...
详细信息
ISBN:
(纸本)0818626720
Replication is a common method for increasing the availability of data in a distributed environment. Our interest is in the application of replication techniques in the domain of parallel processing. This paper explores the issues concerning degree of replication and granularity in the context of a distributed and highly available Linda tuple space. In particular, we study the performance effects of varying the number of replicas and the granularities of replication and concurrency control. Traditionally, when using replication in databases, the granularity of replication and that of concurrency control have been the same (at the file level, for example). This is not an inherent requirement however. In this paper we show by detailed simulation of a replicated Linda tuple space that it is useful to separate the two granularities and that it is an important design issue especially in parallel processing systems.
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data th...
详细信息
ISBN:
(纸本)0769523811
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid.
Skew in the distribution of values taken by an attribute is identified as a major factor that can affect the performance of parallel architectures for relational joins. The effect of skew on the performance of two par...
详细信息
ISBN:
(纸本)0818608935
Skew in the distribution of values taken by an attribute is identified as a major factor that can affect the performance of parallel architectures for relational joins. The effect of skew on the performance of two parallel architectures is evaluated using analytic models. In one architecture, called database machine (DBMC), data as well as processing power are distributed;while in the other architecture, called single processor parallel input/output (SPPI), data is distributed but the processing power is concentrated in one processor. The two architectures are compared in terms of the ratio of MIPS (millions of instructions per second) used by DBMC and SPPI to deliver the same throughput and response time. In addition, the horizontal growth potential of DBMC is evaluated in terms of maximum speedup achievable by DBMC relative to SPPI response time. The MIPS ratio as well as speedup are found to be very sensitive to the amount of skew. These suggest that careful thought should be given in parallelizing database applications and in the design of algorithms and query optimizer for parallel architectures.
Scientific applications are increasingly complex and domain specific, and the underlying architectures of the parallel and distributedsystems on which they are executed also continue to grow in complexity. As these h...
详细信息
ISBN:
(纸本)9781538655559
Scientific applications are increasingly complex and domain specific, and the underlying architectures of the parallel and distributedsystems on which they are executed also continue to grow in complexity. As these high performance parallel and distributed computing applications and environments continue to grow both in complexity and computing power, there is an increasing financial cost associated with both the acquisition and maintenance of those systems. Therefore, the ability to model the performance of these applications and systems before and during their development and deployment to guide cost-effective decisions about their resources and configurations is highly important to the designers of those applications and systems. Performance Evaluation Process Algebra (PEPA) is a modeling language and framework for modeling parallel and distributed computing and communication applications and systems, and numerous examples are present in the literature where PEPA has been utilized to model these systems for evaluating or predicting their performance using various metrics, including throughput, utilization, and robustness. Since its development, the PEPA modeling framework has been expanded to model biological systems and networks (Bio-PEPA), and massive (on the order of similar to 10(129) components) homogeneous systems with Grouped PEPA (GPEPA). PEPA and its derivatives are implemented in a variety of ways, ranging from plug-ins integrated with the Eclipse integrated development environment to standalone command-line based interpreters, each with their own unique and often challenging installation and configuration requirements. To help enable other researchers to more easily utilize these frameworks and facilitate increased and robust reproducibility across end-user platforms, we present and make available containerized versions of a number of these PEPA frameworks. We have validated the functionality of these containers by testing them with models available f
In order to obtain efficiency, current practice in distributed software systems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-le...
详细信息
ISBN:
(纸本)0769506348
In order to obtain efficiency, current practice in distributed software systems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-level Petri-Nets is used to demonstrate how designs can be kept sufficiently abstract to be platform independent and re-usable but still support design alternatives and their evaluation w.r.t. availability and principle system performance.
Much has been said about processing efficiently data in parallel database servers, and some data warehouse applications must process in the order of tens to hundreds of Gigabytes efficiently. Yet, there is no effectiv...
详细信息
ISBN:
(纸本)9780769534718
Much has been said about processing efficiently data in parallel database servers, and some data warehouse applications must process in the order of tens to hundreds of Gigabytes efficiently. Yet, there is no effective approach targeted at using non-dedicated low-cost plaforms efficiently in this context. Imagine taking together 10 or 1000 commodity PCs and setting-up a data crunching platform for large database-resident data with acceptable performance. There are significant inter-related data layout and processing challenges when the computational, storage and network hardware are heterogeneous and slow. We propose how to place, replicate and load-balance the data efficiently in this context. This work innovates in several respects: being practically as fast as full-mirroring without its overhead, exploring schema, chunk-wise placement, replication and load-balanced processing to be faster and more flexible than previous efforts. Our findings are complemented by an evaluation using TPC-H performance benchmark queries.
In this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Sched...
详细信息
In this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Scheduling DCPS. The DCPS is superior to several other algorithms from the literature in terms of computational complexity, processors consumption and solution quality. DCPS has a time complexity of O(e + v logv), as opposed to DSC algorithm O((e+v) logv) which is the best known algorithm. Experimental results demonstrate the superiority of DCPS over the DSC algorithm.
The domains of parallel and distributed computing have been converging continuously up to the degree that state-of-the-art server computer systems incorporate characteristics from both domains: They comprise a hierarc...
详细信息
ISBN:
(纸本)9781509036820
The domains of parallel and distributed computing have been converging continuously up to the degree that state-of-the-art server computer systems incorporate characteristics from both domains: They comprise a hierarchy of enclosures, where each enclosure houses multiple processor sockets and each socket again contains multiple memory controllers. A global address space and cache coherency are facilitated using multiple layers of fast interconnection technologies even across enclosures. The growing popularity of such systems creates an urge for efficient mappings of cardinal algorithms onto such hierarchical architectures. However, the growing complexity of such systems and the inconsistencies between implementation strategies of different hardware vendors make it increasingly harder to do find efficient mapping strategies that are universally valid. In this paper, we present scalable optimization and mapping strategies in a case study of the popular Scale-Invariant Feature Transform (SIFT) computer vision algorithm. Our approaches are evaluated using a state-of-the-art hierarchical Non-Uniform Memory Access (NUMA) system with 240 physical cores and 12 terabytes of memory, apportioned across 16 NUMA nodes (sockets). SIFT is particularly interesting since the algorithm utilizes a variety of common data access patterns, thus allowing us to discuss the scaling properties of optimization strategies from the distributed and parallel computing domains and their applicability on emerging server systems.
暂无评论