The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallelarchitectures are suitable computation platforms for this task. An ...
详细信息
The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallelarchitectures are suitable computation platforms for this task. An important question which arises is how to establish an effective mapping from ANN algorithms to hardware. In this paper, we demonstrate how an effective mapping can be achieved with our programming environment in close combination with an optimized architecture design targeted for neuro-computing.< >
To fully exploit the advantages and the potential of new parallel computer systems, it is necessary to design suitable parallel algorithms. In fact, for speeding-up the computation, parallel algorithms for both archit...
详细信息
To fully exploit the advantages and the potential of new parallel computer systems, it is necessary to design suitable parallel algorithms. In fact, for speeding-up the computation, parallel algorithms for both architectures and new languages are needed. The appropriate implementation of these algorithms on parallel processors is crucial and requires new programming skills. Therefore, in order to exploit the strength of parallel systems, it has become increasingly important to define a parallel-programming methodology. In this paper, a methodology based on the computation graph, for designing structured algorithms for parallel-logic programming will be described. The proposed methodology aims both to improve system performance and to allow unskilled users to define and to implement parallel-logic programs. As an application of the proposed methodology, an algorithm has been completely developed and its performance is described below. The parallel-logic language STRAND running on a Hypercube Intel-iPSC/2 system has been used.
We propose some non-standard, yet straightforward, and highly efficacious alternative modes of data assignment which induce a significant reduction in communication volume and hence in execution time for stencil opera...
详细信息
We propose some non-standard, yet straightforward, and highly efficacious alternative modes of data assignment which induce a significant reduction in communication volume and hence in execution time for stencil operations, i.e. local iterative updates, implemented within a data-parallelprogramming environment. Performance results obtained in the solution of two three-dimensional elliptic partial differential equations (PDEs) using iterative methods entailing such updates indicate that substantial performance increases can be realized using these alternative data assignment schemes.< >
The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compro...
详细信息
The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compromised in their capability to provide inheritance for parallel objects. Even languages that permit some amalgamation of parallelism and inheritance tend to support only single-class inheritance. The purpose of this paper is to specify a strongly typed language framework for parallel object-oriented programming which provides easy-to-use multiple inheritance for parallel objects, including inheritance for synchronization code. The proposed approach to parallelism is based on "separate" methods which generate processes and provide rendezvous-type coordination: it succeeds in cases where known languages fail to combine inheritance with parallelism. Or do it inefficiently and inconveniently.< >
Monte Carlo (MC) and molecular dynamics (MD) simulations are powerful tools for understanding the properties of systems of interacting electrons and phonons in a solid. When mobile electrons are studied, these simulat...
详细信息
Monte Carlo (MC) and molecular dynamics (MD) simulations are powerful tools for understanding the properties of systems of interacting electrons and phonons in a solid. When mobile electrons are studied, these simulations are limited to a few hundred particles. More powerful machines and algorithms must be used to address many of the most important issues in the field. We present results from using the p4 parallelprogramming system on a variety of parallelarchitectures to conduct MC and MD simulations.< >
Term Graph Rewriting Systems (TGRS) have been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional functional programming ones to the (conc...
详细信息
Term Graph Rewriting Systems (TGRS) have been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional functional programming ones to the (concurrent) logic programming ones and various amalgamations of them, to (concurrent) object-oriented ones. More recently, the relationship between TGRS and process calculi (such as the /spl pi/-calculus) as well as Linear Logic has also been explored. In this paper we describe our experience in using an intermediate Compiler Target Language (CTL) based on TGRS for mapping a variety of programming paradigms of the aforementioned types onto it, highlighting in the process some of the issues which we feel any such intermediate representation should address and which form effectively a minimum set of features every CTL should possess.< >
The shared memory paradigm offers a well known programming model for parallel systems. But it lacks from its bad performance in conventional implementations if it is used in large grain or page based systems. The main...
详细信息
The shared memory paradigm offers a well known programming model for parallel systems. But it lacks from its bad performance in conventional implementations if it is used in large grain or page based systems. The main problems are (1) the transparent view on the system level, (2) the false sharing caused by locating several consistency units into the same transportation unit, and that (3) high level software implementations are not integrated within the system architecture. The first point is addressed by annotating programming objects and deriving a specific configuration of system functionalities. The second point is solved by GAME, the General and Autonomous Merging Environment which allows a multiple reader, multiple writer approach. The third point is directed by three implementation models of GAME. A hardware based implementation and even a software based implementation are able to hide the costs of the local activities to perform GAME by the network latency.< >
Many present-day microprocessors have fine grain parallelism, be it in the form of a pipeline, of multiple functional units, or replicated processors. The efficient use of such architectures depends on the capability ...
详细信息
The paper describes methods and tools for debugging parallel programs by visualization and animation of the execution behavior of the programs. Based on an evaluation and classification of existing visualization envir...
详细信息
The paper describes methods and tools for debugging parallel programs by visualization and animation of the execution behavior of the programs. Based on an evaluation and classification of existing visualization environments, the visualization and animation tool VISTOP (VISualization TOol for parallel Systems) was developed as part of the integrated tool environment TOPSYS (TOols for parallel SYStems) for programming distributed memory multiprocessors. VISTOP supports the interactive online visualization of message passing programs based on various views, in particular, a process graph based concurrency view for detecting synchronization and communication bugs.< >
In this paper, we present an abstract model of the RAPID-2 SIMD architecture. RAPID-2 is a massively parallel add-on board for PCs. It implements a "paginated set-associative" model of architecture, and has ...
详细信息
In this paper, we present an abstract model of the RAPID-2 SIMD architecture. RAPID-2 is a massively parallel add-on board for PCs. It implements a "paginated set-associative" model of architecture, and has systolic capabilities. The L1 language implements the abstract model. L1 is a co-specification language for the programming and micro-programming of RAPID-2. It is derived from C. In order to check their semantic, L1 programs can be emulated in a C++ environment. In the near future, they should be compiled into C application programs and the corresponding microprograms.< >
暂无评论