Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, ...
详细信息
Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, parallelism continues to be a challenging requirement in the design of CBHPC (Component-Based High Performance Computing) platforms. This paper presents strong evidence about the efficacy and the efficiency of HPE (Hash programming Environment), a CBHPC platform that provides full support for parallel programming, on the development, deployment and execution of numerical simulation code onto cluster computing platforms. (C) 2012 Elsevier Inc. All rights reserved.
This paper presents an object-oriented interface for parallel programming, and an algorithm for automatic translation into parallel programs. The programming interface consists of a restricted subset of the object-ori...
详细信息
This paper presents an object-oriented interface for parallel programming, and an algorithm for automatic translation into parallel programs. The programming interface consists of a restricted subset of the object-oriented language C++. parallelism is defined explicitly at the abstract level of object definitions and method invocations within a single C++ program. The translator algorithm first generates a machine-independent communication graph and proceeds with the creation of the parallel programs, which are demonstrated for transputer systems with the HELIOS operating system. The necessary communication statements are generated automatically.
parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, ther...
详细信息
parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, there is always a considerable amount of unused computing capacity available in the network. However, heterogeneity in architectures and operating systems, load variations on machines, variations in machine availability, and failure susceptibility of networks and workstations complicate the situation for the programmer. In this context, new programming paradigms that reduce the burden involved in programming for distribution, load adaptability, heterogeneity, and fault tolerance gain importance. This paper identifies the issues involved in parallel computing on a network of workstations. The Anonymous Remote Computing (ARC) paradigm is proposed to address the issues specific to parallel programming on workstation systems. ARC differs from the conventional communicating process model by treating a program as one single entity consisting of several loosely coupled remote instruction blocks instead of treating it as a collection of processes. The ARC approach results in distribution transparency and heterogeneity transparency. At the same time, it provides fault tolerance and load adaptability to parallel programs on workstations. ARC is developed in a two-tiered architecture consisting of high level language constructs and low level ARC primitives. The paper describes an implementation of the ARC kernel supporting ARC primitives.
The study of tumor growth biology with computer-based models is currently an area of active research. Different simulation techniques can be used to describe the complexity of any real tumor behavior, among these, &qu...
详细信息
The study of tumor growth biology with computer-based models is currently an area of active research. Different simulation techniques can be used to describe the complexity of any real tumor behavior, among these, "cellular automata"-based simulations provide an accurate tumor growth graphical representation while, at the same time, keep simpler the implementation of the automata as computer programs. Several authors have recently published relevant proposals, based on the latter approach, to solve tumor growth representation problem through the development of some strategies for accelerating the simulation model. These strategies achieve computational performance of cellular-models representation by the appropriate selection of data types, and the clever use of supporting data structures. However, as of today, multithreaded processing techniques and multicore processors have not been used to program cellular growth models with generality. This paper presents a new model that incorporates parallel programming for multi and manycore processors, and implements any synchronization requirement necessary to implement the solution. The proposed parallel model has been proved using Java and C++ program implementations on two different platforms: chipset Intel i5-4440 and one node of 16-processors cluster of our university. The improvement resulting from the introduction of parallelism into the model is analyzed in this paper, comparing it with the standard sequential simulation model currently used by researchers in mathematical oncology.
parallel programming has become increasingly popular in the computer educational field over the past few years. Although parallel programs obtain the short execution time and the high throughput, learning how to write...
详细信息
parallel programming has become increasingly popular in the computer educational field over the past few years. Although parallel programs obtain the short execution time and the high throughput, learning how to write a well-structured and high-performance parallel program is still one of the challenges for most of students. How to let students learn parallel programming well is one of the important tasks that educators should resolve. This paper presents the learning of parallel programming using software refactoring methodologies and tools. Manual and automated refactoring are introduced to show how the learning is improved respectively. With manual refactoring, students learn how to perform the data or task decomposition and how to write a well-structured parallel software via customized programs and some benchmarks in JGF benchmark suite;with automated refactoring, students can transform the parallel parts quickly, and then evaluate the performance of a parallel software. Two automated refactoring tools are developed for educational purposes. Some of the experiences are also shared during conducting the course. (C) 2017 Wiley Periodicals, Inc.
This paper describes the design and implementation of a modular distributed architecture for distributed autonomous control of modular robot systems using parallel programming in industrial robotic manufacturing appli...
详细信息
This paper describes the design and implementation of a modular distributed architecture for distributed autonomous control of modular robot systems using parallel programming in industrial robotic manufacturing applications. The control system has an overall hierarchical structure, and parallel structure in its lower levels. The lower levels are composed of several autonomous units;each unit is equipped with a microprocessor-based controller, and has its own control functions with sensors, actuators, and communication interfaces as an intelligent autonomous sensing and actuating device. Operations of these autonomous actuators are integrated through a communication network of serial bus type. An autonomous actuator is a basic unit for distributed motion control of a robotic mechanism. Because of the hardware and software modularity, they have the advantages such as reduction of system costs, application flexibility, system reliability, and system extensibility. A microcontroller-based flexible and extensible architecture is proposed, and the features of distributed microcontroller implementation are discussed. For a two degrees of freedom robotic mechanism, a mobile robot with two coaxial independently driving wheels, position and velocity control algorithms to follow a planned path cooperatively are implemented, and the performance of the proposed architecture is experimentally evaluated. (C) 2003 Elsevier Science B.V. All rights reserved.
MILLIPEDE is a project aimed at developing a distributed shared memory environment for parallel programming. A major goal of this project is to support easy-to-grasp parallel programming languages that will also make ...
详细信息
MILLIPEDE is a project aimed at developing a distributed shared memory environment for parallel programming. A major goal of this project is to support easy-to-grasp parallel programming languages that will also make it straightforward to parallelize existing code. Other targets are forward compatibility and availability of both the user programs (hence the shared memory support and the C-like parallel language PARC) and the system itself (which is thus implemented in user-level and using the operating system exported services). Locality of memory references, which implies efficiency and speedups, Is maintained by MILLIPEDE using page and thread migration, through which dynamic load-balancing and weak memory are implemented. (C) 1997 by John Wiley & Sons, Ltd.
We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Kn...
详细信息
We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Knuth summations, the consistency of the numerical results can be greatly improved with minimal memory and computational cost. This study assesses the value of the enhanced numerical consistency in the context of general finite difference or finite volume calculations. (C) 2011 Elsevier B.V. All rights reserved.
Concurrent programming is very often used to program massively parallelalgorithms. Usually, an imperative programming language is used with a message passing communicationlibrary like message passing interface (MPI) o...
详细信息
Concurrent programming is very often used to program massively parallelalgorithms. Usually, an imperative programming language is used with a message passing communicationlibrary like message passing interface (MPI) or parallel virtual machine (PVM). This approach isvery general since it allows to define any parallel algorithm, including the details of itscommunication protocols. Nevertheless this freedom does not come for free, and the development ofsuch programs is difficult because they may contain indeterminism and deadlocks. This is confirmedby the high complexity of related validation problems. The semantics of a concurrent program beingin general very complex, the time required to run it (related to its operational semantics) is alsodifficult to determine, which hinders the portability of performances.
The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses ...
详细信息
The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses the GPU's advantages. We developed an analysis code named GMAP to investigate how the dynamic analysis algorithm of multi-body systems is implemented in the GPU parallel programming. The numerical accuracy of GMAP is compared with the commercial program MSC/ADAMS. The numerical efficiency of GMAP is compared with the sequential CPU-based program. Multiple pendulums with bodies and joints and the net-shape system with bodies and spring-dampers are employed for computer simulations. The simulation results indicate that the accuracy of GMAP's solution is the same as that of ADAMS. In the net type system that has 2370 spring-dampers, GMAP indicates an improved efficiency of about 566.7 seconds (24.7% improvement). It is noted that the larger the size of the system, the better the time efficiency.
暂无评论