Protein-protein interactions are strictly correlated to the surface shape because beside a large number of structural amino acids composing the core there are few superficial amino acids that define the functionality....
详细信息
Protein-protein interactions are strictly correlated to the surface shape because beside a large number of structural amino acids composing the core there are few superficial amino acids that define the functionality. This study concerns the development of a tool that starting from the 3D atomic coordinates of a protein, as retrieved from the protein data bank (PDB), models the macromolecular surface in an implicit way using an approach that is more suitable for this kind of analysis than the parametric one. The marching cubes algorithm is used to process the volumetric description of the protein obtaining a precise representation of the corresponding surface. Because of the large amount of data to consider in studying whole protein families, this algorithm is implemented in parallel on a computer cluster to improve its performance. The parallel version of marching cubes is developed in ASSIST, a high level structured parallel programming system, obtaining a near optimal performance considering computational activities, and acceptable performance including I/O.
The development of high-performance computing (HPC) programs is crucial to progress in many fields of scientific endeavor. We have run initial studies of the productivity of HPC developers and of techniques for improv...
详细信息
The development of high-performance computing (HPC) programs is crucial to progress in many fields of scientific endeavor. We have run initial studies of the productivity of HPC developers and of techniques for improving that productivity, which have not previously been the subject of significant study. Because of key differences between development for HPC and for more conventional software engineering applications, this work has required the tailoring of experimental designs and protocols. A major contribution of our work is to begin to quantify the code development process in a specialized area that has previously not been extensively studied. Specifically, we present an analysis of the domain of high-performance computing for the aspects that would impact experimental design; show how those aspects are reflected in experimental design for this specific area; and demonstrate how we are using such experimental designs to build up a body of knowledge specific to the domain. Results to date build confidence in our approach by showing that there are no significant differences across studies comparing subjects with similar experience tackling similar problems, while there are significant differences in performance and effort among the different parallel models applied.
In this paper we describe an experiment in building grid-aware software components as basic blocks for high-performance, adaptive grid computations. Within the *** project we are working to the enhancement of the coor...
详细信息
In this paper we describe an experiment in building grid-aware software components as basic blocks for high-performance, adaptive grid computations. Within the *** project we are working to the enhancement of the coordination language approach of the ASSIST parallel programming environment to support high-performance software components, that can react to the changes occurring on dynamic execution platforms. We are currently evaluating expressiveness and performance issues also by integrating ASSIST compilation tools with various component frameworks. In the paper we discuss results about a graphical application made up of four CCM components, each one produced by automatic encapsulation of a parallel program. We evaluate performance behaviour by comparing the composite application with the equivalent ASSIST one, on three different testbeds: homogeneous and heterogeneous clusters, and a small grid over a WAN. We also report first results about an application manager component that applies a simple adaptive policy, reacting to pipeline unbalance, to vary the amount of employed computational resources.
This article presents the design of a high level parallel composition or CPAN (according to its Spanish acronym) that implements a parallelization of the algorithmic design technique named branch and bound and uses it...
详细信息
This article presents the design of a high level parallel composition or CPAN (according to its Spanish acronym) that implements a parallelization of the algorithmic design technique named branch and bound and uses it to solve the travelling salesman problem (TSP), within a methodological infrastructure made up of an environment of parallel objects, an approach to structured parallel programming and the object-orientation paradigm. A CPAN is defined as the composition of a set of parallel objects of three types: one object manager, the stages and the collector objects. By following this idea, the branch and bound design technique implemented as an algorithmic parallel pattern of communication among processes and based on the model of the CPAN is shown. Thus, in this work, the CPAN branch and bound is added as a new pattern to the library of classes already proposed in Rossainz et al. (2004), which was initially constituted by the CPAN farm, pipe and treeDV that represent, respectively, the patterns of communication farm, pipeline and binary tree, the latter one implementing the design technique known as divide and conquer. As the programming environment used to derive the proposed CPAN, we use C++ and the POSIX standard for thread programming.
The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a share...
详细信息
The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.
Feijen and van Gasteren have shown how to use the theory of Owicki and Gries to design concurrent programs, however, the lack of a formal theory of progress has meant that these designs are driven entirely by safety r...
详细信息
ISBN:
(纸本)1920682236
Feijen and van Gasteren have shown how to use the theory of Owicki and Gries to design concurrent programs, however, the lack of a formal theory of progress has meant that these designs are driven entirely by safety requirements. Proof of progress requirements are made post-hoc to the derivation and are operational in nature. In this paper, we describe the use of an extended theory of Owicki and Gries in concurrent program design. The extended theory incorporates a logic of progress, which provides opportunity to develop a program in a manner that gives proper consideration to progress requirements. Dekker's algorithm for two process mutual exclusion is chosen to illustrate the use of the extended theory.
The authors reported on a first parallel implementation of a recent algorithm to factor positive dimensional solution sets of polynomial systems. As the algorithm uses homotopy continuation, a good speedup of the path...
详细信息
The authors reported on a first parallel implementation of a recent algorithm to factor positive dimensional solution sets of polynomial systems. As the algorithm uses homotopy continuation, a good speedup of the path tracking jobs was observed. However, for solution sets of high degree, the overhead of managing different homotopies and large lists of solutions exposes the limits of the master/servant parallel programming paradigm for this type of problem. A probabilistic complexity study suggests modifications to the method, which will also improve the serial version of the original algorithm.
The paper is devoted to analysis of a strategy of computation distribution on heterogeneous parallel systems. According to this strategy processes of parallel program are distributed over the processors according to t...
详细信息
The paper is devoted to analysis of a strategy of computation distribution on heterogeneous parallel systems. According to this strategy processes of parallel program are distributed over the processors according to their performances and data are distributed between processes evenly. The paper presents an algorithm that computes optimal number of the processes and their distribution over processors minimizing the execution time of an application. The processor performance is considered as a function of the number of processes running on the processor and the amount of the data processing by the processor.
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. The functionality given by the system allows for efficient handling of portability and irregularity of ...
详细信息
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. The functionality given by the system allows for efficient handling of portability and irregularity of parallel programs. Two forms of parallelism are addressed: loop level parallelism and task level parallelism.
One of the most crucial phases in the process of text recognition is thinning of characters to a single pixel notation. The success measure of any thinning algorithm lies in its property to retain the original charact...
详细信息
One of the most crucial phases in the process of text recognition is thinning of characters to a single pixel notation. The success measure of any thinning algorithm lies in its property to retain the original character shape, which are also called unit-width skeletons. No agreed universal thinning algorithm exists to produce character skeletons from different languages, which is a pre-process for all subsequent phases of character recognition such as segmentation, feature extraction, classification, etc. Written natural languages based on their intrinsic properties can be classified as cursive and non-cursive. Thinning algorithms when applied on cursive languages, poses greater complexity due to their distinct non-isolated boundaries and complex character shapes such as in Arabic, Sindhi, Urdu, etc. Such algorithms can easily be extended for parallel implementations. Selecting certain pixel arrangement grid templates over the other pixel patterns for the purpose of generating character skeletons exploits the parallel programming. The success key is in determining the right pixel arrangement grids that can reduce the cost of iterations required to evaluate each pixel for selecting for thinning or ignoring. This paper presents an improved parallel thinning algorithm, which can be easily extended for cursive or non-cursive languages alike by introducing a modified set of preservation rules via pixel arrangement grid templates, making it both robust to noise and speed. Experimental results show its success over cursive languages like Arabic, Sindhi, Urdu and non-cursive languages like English, Chinese and even numerals. Thus making it probably a universal thinning algorithm
暂无评论