Neural network hardware using time-shared bus and integer representation architecture has already been fabricated and reported from the design viewpoint. However, nothing related to performance evaluation of hardware ...
详细信息
Neural network hardware using time-shared bus and integer representation architecture has already been fabricated and reported from the design viewpoint. However, nothing related to performance evaluation of hardware has yet been presented. Computation-speed, scalability and learning accuracy of hardware are evaluated theoretically and experimentally using a Back Propagation (BP) algorithm. In addition, a mirror-weight assignment technique is proposed for high-speed computation in the BP. NETTalk, an English-pronunciation-reasoning task, has been chosen as the target application for the BP. In the experiment, recently-developed neuro-hardware based on the above architecture and its parallel programming language are used. An outline of the language is described along with BP programming. Mirror-weight assignment allows maximum speed at 55.0 MCUPS (Million Connections Updated Per Second) using 256 neurons in the hidden-layer (numbers of neurons in input- and output-layers are fixed at 203 and 26 respectively in NETTalk). In addition, if scalability is defined as a function of the number of neurons in the hidden-layer, the machine retains high scalability at 0.5 if such a maximum speed needs to be used. No degradation in learning accuracy occurs when experimental results computed using the neuro-hardware are compared with those obtained by floating-point representation architecture (workstation). The experiment indicates that the present integer representational design of the neuro-hardware is sufficient for NETTalk. Performance has been evaluated theoretically. evaluation purposes, it is assumed that most of the execution-time is taken up by bus cycles. On the basis of this assumption, an analytical model of computation-speed and scalability is proposed. Analytical predictions agreed well with experimental results.
We propose probabilistic guards and analyze their performance. To reduce the total task division cost, probabilistic guards can prevent thief workers from stealing small tasks from victim workers probabilistically. In...
详细信息
We propose probabilistic guards and analyze their performance. To reduce the total task division cost, probabilistic guards can prevent thief workers from stealing small tasks from victim workers probabilistically. In this study, we have implemented probabilistic guards on a work-stealing framework called Tascell. Without an upper limit to the number of repeated probabilistically prevented steal attempts, a thief may repeat an unbounded number of probabilistically prevented steal attempts until success if a victim uses a probabilistic guard that rejects steal attempts with a non-zero probability. We measured the actual numbers of repeated attempts until success, and evaluated the performance of probabilistic guards with various upper limits. In this paper, we also propose virtual probabilistic guards that act as probabilistic guards without repeating probabilistically prevented steal attempts. Virtual probabilistic guards exhibit superior performance compared to probabilistic guards. Our evaluation is based on parallelized "highly serial" force calculation in a shared memory environment and five Tascell programs in a distributed memory environment. (C) 2018 Elsevier B.V. All rights reserved.
The distributed computer system described in this paper is a set of computernodes interconnected in an interconnection network via packet-switching *** nodes communicate with each other by means of message-passing pro...
详细信息
The distributed computer system described in this paper is a set of computernodes interconnected in an interconnection network via packet-switching *** nodes communicate with each other by means of message-passing protocols. Thispaper presents the implementation of rendezvous facilities as highlevel prhoitives provided by a parallel programming language to support interprocess cornmunication andsynchronisation.
The execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. The execution environment must provide a processing model, cons...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
The execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. The execution environment must provide a processing model, consisting of programming and execution models, with the objective appropriately exploiting grid computing characteristics. This paper proposes a parallel processing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallellanguageprogramming model. The environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. The results show that this environment is an efficient solution for the execution of parallel applications.
This paper describes the design and implementation of an Efficient Architecture for Running THreads (EARTH) runtime system for a multi-processor/multi-node cluster. The (EARTH) model was designed to support the effici...
详细信息
This paper describes the design and implementation of an Efficient Architecture for Running THreads (EARTH) runtime system for a multi-processor/multi-node cluster. The (EARTH) model was designed to support the efficient execution of parallel (multi-threaded) programs with irregular fine-grain parallelism using off-the-shelf computers. Implementing an EARTH runtime system requires an explicitly threaded runtime system. For portability, we built this runtime system on top of Pthreads under Linux and used sockets for inter-node communication. Moreover, in order to make the best use of the resources available on a cluster of symmetric multi-processors (SMP), this implementation enables the overlapping of communication and computation. We used Threaded-C, a language designed to implement the programming model supported by the EARTH architecture. This language allows the expression of various levels of parallelism and provides the primitives needed to manage the required communication and synchronization. The Threaded-C programminglanguage supports irregular fine-grain parallelism through a two-level hierarchy of threads and fibers. It also provides various synchronization and communication constructs that reflect the nature of EARTH'S fibers-non-preemptive execution with data-driven scheduling-as well as the extensive use of split-phase transactions on EARTH to execute long-latency operations. Copyright (C) 2003 John Wiley Sons, Ltd.
NestStep is a parallel programming language for the BSP (bulk-hronous parallel) programming model. In this article we describe the concept of distributed shared arrays in NestStep and its implementation on top of MPI....
详细信息
NestStep is a parallel programming language for the BSP (bulk-hronous parallel) programming model. In this article we describe the concept of distributed shared arrays in NestStep and its implementation on top of MPI. In particular, we present a novel method for runtime scheduling of irregular, direct remote accesses to sections of distributed shared arrays. Our method, which is fully parallelized, uses conventional two-sided message passing and thus avoids the overhead of a standard implementation of direct remote memory access based on one-sided communication. The main prerequisite is that the given program is structured in a BSP-compliant way. Copyright (C) 2004 John Wiley Sons, Ltd.
The principles of hardware compilation could be set to rewrite the rule book of silicon design. This article describes Handel-C, a parallel programming language that is providing programmers with a route to FPGA-based...
详细信息
The principles of hardware compilation could be set to rewrite the rule book of silicon design. This article describes Handel-C, a parallel programming language that is providing programmers with a route to FPGA-based VLSI design, not by offering them a familiar programming environment combined with access to the parallel constructs familiar to the hardware designer, but by expressing the parallelism at a totally different level of abstraction; it doesn't describe the hardware like an HDL does - it describes the computation.
暂无评论