In this paper we introduce and evaluate two prefetching techniques to improve the performance of Java applications executed on the grid. These techniques are experimentally evaluated on two grid environments, by runni...
详细信息
ISBN:
(纸本)9783540680673
In this paper we introduce and evaluate two prefetching techniques to improve the performance of Java applications executed on the grid. These techniques are experimentally evaluated on two grid environments, by running test applications on two different grid deployment configurations. Our testbed is SUMA/G, a grid platform specifically targeted at executing Java bytecode on Globus grids. The experimental results show that these techniques can be effective on improving the performance of applications run on the grid, especially for compute intensive scientific applications.
Web applications rely on database systems to store and manage data. Existing major databases use lock-based concurrency control to coordinate transactions. However, lock-based transaction processing can be expensive a...
详细信息
ISBN:
(纸本)9781665473156
Web applications rely on database systems to store and manage data. Existing major databases use lock-based concurrency control to coordinate transactions. However, lock-based transaction processing can be expensive and result in unnecessary deadlocks because the database systems are oblivious to the application semantics, such as access patterns. This paper examines an issue commonly seen in web applications: inefficient associated accesses. They are problematic because they cause database systems to abort concurrent transactions frequently, degrading application performance. To address this issue, we build Railyzer, a tool that automatically analyzes web applications' database access patterns and provides possible optimization strategies. First, we establish a criterion to differentiate associated accesses from other accesses. Then, based on a language-agnostic analysis of the application, we create a method for identifying associated accesses. Finally, we use heuristics to locate inefficient ones and suggest fixes. We discovered 83 potential optimizations among the six open source applications. Some of these optimizations improve the application throughput by up to 70%.
Decision support systems use On-Line Analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build su...
详细信息
ISBN:
(纸本)0818680679
Decision support systems use On-Line Analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build such systems. More recently, multi-dimensional database techniques (MOLAP) have been applied to decision-support applications. Data is stored in multidimensional arrays which is a natural way to express the multi-dimensionality of the enterprise and is more suited for analysis. Precomputed aggregate calculations in a Data cube can provide efficient query processing for OLAP applications. In this paper we present algorithms and results for in-memory data cube construction on distributed memory machines.
Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54...
详细信息
ISBN:
(纸本)9780889866379
Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing interest, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dy...
详细信息
ISBN:
(纸本)9781538608623
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing interest, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dynamic loop scheduling (DLS) techniques are an effective approach towards performance improvement of scientific applications via load balancing. These techniques address algorithmic and systemic sources of load imbalance by dynamically assigning tasks to processing elements. The DLS techniques have demonstrated their effectiveness when applied in real applications. Complementing native experiments, simulation is a powerful tool for studying the behavior of parallel and distributedapplications. This work is a comprehensive reproducibility study of experiments using DLS techniques published in earlier literature to verify their implementations into SimGrid-MSG. In earlier work, it was shown that having a detailed degree of information regarding the experiments to be reproduced is essential for a successful reproduction. This work concentrates on the reproduction of experiments with variable application behaviour and high degree of parallelism of applications and systems. It is shown that reproducing measurements of applications with high variance in task execution times is challenging, albeit feasible and useful. The success of the present reproducibility study signifies the fact that the implementation of the DLS techniques in SimGrid-MSG is verified for the considered applications and systems. Thus, it enables well-founded future research using the DLS techniques in simulation.
Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software...
详细信息
ISBN:
(纸本)0769511651
Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software are much more difficult than testing and debugging sequential software. This is due to the fact that errors are usually reproducible in sequential programs while they may not be reproducible in parallel and distributed programs. In addition, parallel and distributed programs introduce new types of errors and anomalies, race conditions and deadlocks, that do not exist in sequential software. In this paper I present a survey and a taxonomy of existing approaches for detecting race conditions and deadlocks in parallel and distributed programs. These approaches can be classified into two main classes. Static analysis techniques, and dynamic analysis techniques. I have subdivided further the static analysis techniques into three different subgroups: The concurrency analysis methods, The data-flow analysis methods, and the formal proof methods. A brief discussion and highlighting of main problems in most known approaches is given. The paper is concluded with tables summarizing the comparison between the surveyed approaches.
In many cases, an elliptical system of partial differential equations (PDEs) has to be solved in order to compute a given flow problem. For domain decomposition, mainly the multi-block grid approach is used. A variety...
详细信息
ISBN:
(纸本)1892512459
In many cases, an elliptical system of partial differential equations (PDEs) has to be solved in order to compute a given flow problem. For domain decomposition, mainly the multi-block grid approach is used. A variety of flows are unsteady, thus the calculation of path lines is a common way of exploring the flow field. However, computing path lines is more complicated if the underlying grid geometry changes over time. We make use of a fragmented multi-block dataset for a parallelization approach to compute path lines. We describe our enhancements of VTK, the used basic toolkit for scientific visualization, which neither supports multi-block nor time-dependent datasets. Our extensions include the handling of unsteady datasets as well as adaptive step-size control and time-position-interpolation. Finally, we depict the results of our efforts in order to speed-up Computational Fluid Dynamics (CFD) explorations in Virtual Environments.
This special issue includes the extended version of selected papers presented at the fourth parallel Architectures and Bioinspired Algorithms Workshop held in Galveston Island (TX, USA) on October 14, 2011 in conjunct...
详细信息
This special issue includes the extended version of selected papers presented at the fourth parallel Architectures and Bioinspired Algorithms Workshop held in Galveston Island (TX, USA) on October 14, 2011 in conjunction with parallel Architectures and Compilation techniques (PACT). This workshop follows the success of the three previous workshops held in conjunction with PACT 2008 in Toronto, Canada [1], PACT 2009 in Raleigh, USA [2], and PACT 2010 in Vienna [3], Austria and the two previous Workshops on parallel Bioinspired Algorithms [4] held in Oslo, 2005, (together with IEEE internationalconference on parallelprocessing (ICPP)) [5] and London, 2007 (together with Association for Computing Machinery (ACM) Genetic and Evolutionary Computation conference (Gecco) 2007) [6]. These series of workshops has shown that knowledge fields such as parallel computer architectures and parallel and distributed Computing and Bioinspired Algorithms, which could seem quite different in a first approach, are able to find transversal elements that enrich them.
Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. The very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process....
详细信息
ISBN:
(纸本)9781665497473
Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. The very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process. Numerous surveys on these architectures exist. Other surveys also gather the tools and methods, but none of them focuses on the mapping process only. This paper focuses solely on automated methods and techniques for mapping applications on CGRA and covers the last two decades of research. This paper aims at providing the terminology, the problem formulation, and a classification of existing methods. The paper ends with research challenges and trends for the future.
This paper describes two parallel algorithms for the 2D knapsack (or cutting-stock) problem which is the optimal packing of multiples of n rectangular objects into a knapsack of size L ×;W and are only obtainable...
详细信息
ISBN:
(纸本)1601320841
This paper describes two parallel algorithms for the 2D knapsack (or cutting-stock) problem which is the optimal packing of multiples of n rectangular objects into a knapsack of size L ×;W and are only obtainable with guillotine-type (side to side) cuts. Here, we describe and analyze this problem for parallel computing with GPUs. These algorithms solve this NP class problem with a parallel algorithm that runs in O(W(n+L + W)) time using L processors, where LW for a 2D knapsack problem with a capacity of L × W. The new multiple IS version using LW processors and max(L,M) ISs runs in O(n+L + W) given practical hardware considerations. Both of these results are cost optimal with respect to the best sequential implementation. Moreover, an efficient GPGPU algorithm for this well-known problem should give insight to how the parallel computing using graphics processing units compares to other parallel models such as PRAM.
暂无评论