Exposed-datapath architectures yield small, low-power processors that trade instruction word length for aggressive compile-time scheduling and a high degree of instruction-level parallelism. In this paper, we present ...
详细信息
ISBN:
(纸本)9781479928941
Exposed-datapath architectures yield small, low-power processors that trade instruction word length for aggressive compile-time scheduling and a high degree of instruction-level parallelism. In this paper, we present a general-purpose parallel accelerator consisting of a main processor and eight symmetric clusters, all in a single core. Use of a lightweight and memory-efficient application programming interface allows for the first high-performance program executing both sequential and data-parallel code on the same TTA processor. We use the processor for LDPC encoding, a popular method of forward error correction. Demonstrating the flexibility of software-defined radio, we benchmark the processor with two programs, one which can handle almost any sort of LDPC code, and another which is optimized for a specific standard. We achieve a throughput of 5 Mb/s with the flexible program and 92 Mb/s with the standard-specific one, while consuming only 95 mW at a clock frequency of 1175 MHz.
With the increased complexity of applications, parallel computing has proved to be an alternative to supercomputing in solving large problems. However developing parallel applications is more difficult compared to seq...
详细信息
With the increased complexity of applications, parallel computing has proved to be an alternative to supercomputing in solving large problems. However developing parallel applications is more difficult compared to sequential programming. Visual technologies can be employed to aid the multi-dimensional tasks of parallel programming. This paper presents a case study of the use of an integrated visual programming environment in creating parallel applications. Techniques for the hierarchical construction of parallel programs are presented, together with an evaluation of the performance of the generated code for a matrix multiplication application.
parallel programming is an excellent way to speed up computation due to the simultaneous execution of the processes so that the operation is divided into the available threads. OpenMP, available for C, C++, and Fortra...
详细信息
ISBN:
(数字)9781728196756
ISBN:
(纸本)9781728196763
parallel programming is an excellent way to speed up computation due to the simultaneous execution of the processes so that the operation is divided into the available threads. OpenMP, available for C, C++, and Fortran, is one of the popular frameworks for multiprocessing programming with shared memory. This work aims to utilize parallel programming on two classification cases using logistic regression and artificial neural networks. The main advantage of using parallel programming is the speed being higher. The first case study aims to predict the diabetic outcome from a dataset that contains health data of patients using logistic regression. It aims to explore a framework in parallelizing function execution to speed large sample processing. The case study featuring artificial neural networks will address a workflow of a train and execute deep learning models using parallelization to process large influx of timeseries data to diagnose presence of an anomaly. The case that used logistic regression achieved 62.5% reduction in execution time while the case that used neural network scored a 4.5 folds nominal reduction in training time and 71.1% reduction in execution time using 1-8 multithreading range.
Some parallel programming techniques in the intensional functional language uLucid are presented. Programs in uLucid have a kind of implicit parallelism, context parallelism, when the processes that evaluate independe...
详细信息
Some parallel programming techniques in the intensional functional language uLucid are presented. Programs in uLucid have a kind of implicit parallelism, context parallelism, when the processes that evaluate independent expressions at different contexts are executed in parallel. Communications among the processes that evaluate expressions with dependency relations at different contexts are expressed explicitly by using context switching operators. The function of context switching operators is twofold. From the problem-solving point of view, they are part of the solution described by the program-that is, they have pure mathematical meanings. From the operational point of view, they are communication operators for parallel processes-that is, they have parallel operational meanings. It is shown that one can explicitly express control information about parallelism, communications, among processes, and process-to-processor mappings in uLucid programs by defining context spaces and the associated context switching operators that specify abstract parallel architecture.< >
Summary form only given. The programming Language Research Group at Sun Microsystems Laboratories seeks to apply lessons learned from the Java (TM) programming language to the next generation of programming languages....
详细信息
Summary form only given. The programming Language Research Group at Sun Microsystems Laboratories seeks to apply lessons learned from the Java (TM) programming language to the next generation of programming languages. The Java language supports platform-independent parallel programming with explicit multithreading and explicit locks. As part of the DARPA program for High Productivity Computing Systems, we are developing Fortress, a language intended to support large-scale scientific computation. One of the design principles is that parallelism be encouraged everywhere (for example, it is intentionally just a little bit harder to write a sequential loop than a parallel loop). Another is to have fairly rich mechanisms for encapsulation and abstraction; the idea is to have a fairly complicated language for library writers that enable them to write libraries that present a relatively simple set of interfaces to the application programmer. We will discuss ideas for using a rich polymorphic type system to organize multithreading and data distribution on large parallel machines. The net result is similar in some ways to data distribution facilities in other languages such as HPF and Chapel, but more open-ended, because in Fortress the facilities are defined by user-replaceable libraries rather than wired into the compiler.
To improve the parallel efficiency (PE) of the ordered-subsets expectation-maximization (OSEM) algorithm for 3D PET image reconstruction, we implemented the algorithm with 1) an OpenMP and 2) a hybrid message passing ...
详细信息
To improve the parallel efficiency (PE) of the ordered-subsets expectation-maximization (OSEM) algorithm for 3D PET image reconstruction, we implemented the algorithm with 1) an OpenMP and 2) a hybrid message passing interface (MPI)-OpenMP model on the basis of a standard MPI implementation. The motivation was to reduce the inter-processor data exchange time which was the dominant PE limiting factor of the MPI model when large number of processors was used. The OpenMP model used a fine-grained approach and showed significant speedup only up to 2-3 processors for both the true shared memory and the single system image (SSI) distributed shared memory architectures. The hybrid MPI-OpenMP model achieved a consistent improvement of ~10% in terms of speedup factor on a large number of parallel processors compared to the pure MPI approach. As clusters of larger symmetric multiprocessor (SMP) machines continue to become more cost effective, we expect this hybrid MPI-OpenMP approach to be increasingly valuable to accelerate 3D PET reconstructions, and other applications with similar computational characteristics
NASA Technical Reports Server (Ntrs) 19870009555: Concurrent Extensions to the Fortran Language for parallel programming of Computational Fluid Dynamics Algorithms by NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 19870009555: Concurrent Extensions to the Fortran Language for parallel programming of Computational Fluid Dynamics Algorithms by NASA Technical Reports Server (Ntrs); published by
Rank modulation is a technique for representing stored information in an ordered set of flash memory cells by a permutation that reflects the ranking of their voltage levels. In this paper, we consider two figures of ...
详细信息
Rank modulation is a technique for representing stored information in an ordered set of flash memory cells by a permutation that reflects the ranking of their voltage levels. In this paper, we consider two figures of merit that can be used to compare parallel programming algorithms for rank modulation. These two criteria represent different tradeoffs between the programming speed and the lifetime of flash memory cells. In the first scenario, we want to find the minimum number of programming rounds required to increase a specified cell-level vector ℓ 0 to a cell-level vector corresponding to a target rank permutation τ, with no restriction on the maximum allowable cell level. We derive lower and upper bounds on this number, denoted by t 1 *(τ, ℓ 0 ). In the second scenario, we seek an efficient programming strategy to achieve a cell-level vector ℓ(τ) consistent with the target permutation τ, such that the maximum cell level after programming is minimized. Equivalently, this strategy maximizes the number of information update cycles supported by the device before requiring a block erasure. We derive upper bounds on the minimum number of programming rounds required to achieve cell-level vector ℓ(τ), denoted by t 1 (τ, ℓ 0 ), and propose a programming algorithm for which the resultant number of programming rounds is close to t 2 *(τ, ℓ 0 ).
Because of its superior performance and cost-effectiveness, parallel computing will become the future standard, provided we have the appropriate programming models, tools and compilers needed to make parallel computer...
详细信息
Because of its superior performance and cost-effectiveness, parallel computing will become the future standard, provided we have the appropriate programming models, tools and compilers needed to make parallel computers widely usable. The dominating programming style is procedural, given in the form of either the memory sharing or the message-passing paradigm. The advantages and disadvantages of these models and their supporting architectures are discussed, as well as the tools by which parallel programming is made machine-independent. Further improvements can be expected from very high level coordination languages. A general breakthrough of parallel computing, however, will only come with the parallelizing compiler that enable the user to program applications in the conventional sequential style. The state-of-the-art of parallelizing compilers is outlined, and it is shown how they will be supported by higher-level programming models and multi-threaded architectures.< >
暂无评论