This paper gives a brief description of recent O.R. activity in China. It consists of four parts: mathematical programming; queueing theory and Markov decision processes; reliability theory; simulation. Emphasis is pl...
详细信息
This paper gives a brief description of recent O.R. activity in China. It consists of four parts: mathematical programming; queueing theory and Markov decision processes; reliability theory; simulation. Emphasis is placed on the current situation of practical O.R.
This paper provides the vision of the Barcelona Supercomputing Center towards exascale computing. We believe that it is key to have unified views of future computer systems, looking at the good ideas, developments, an...
详细信息
This paper provides the vision of the Barcelona Supercomputing Center towards exascale computing. We believe that it is key to have unified views of future computer systems, looking at the good ideas, developments, and practices from the past and applying them at the scalability levels we want to consider. The programming model is of Alexander's sword used to break the Gordian knot of exascale systems based on massive multicore architectures. The implementation of the programming model should decouple the way programs are written by the user (parallelism, address spaces, etc.) and executed by the runtime (execution vehicles, memory containers, malleability and load balancing, fault tolerance, etc.) on a specific target architecture. At the application level, it will be crucial to ensure that application porting is going to guarantee their survival for some decades or their clean upgrade to the foreseeable explosion of hardware platforms. Performance tools and analysis practices are in their infancy with regard to providing the required exascale support. BSC would like to contribute with this vision and ongoing efforts to the holistic exascale initiative.
An introduction is presented in which the editor discusses various reports within the issue on topics including Message Passing Interface (MPI), parallel input/output (I/O), and parallel programming.
An introduction is presented in which the editor discusses various reports within the issue on topics including Message Passing Interface (MPI), parallel input/output (I/O), and parallel programming.
For decades, the RPC abstraction has been known to be fraught with serious problems related to partial failure, latency, and concurrency. Still, many developers continue to use RPC—some are even developing...
详细信息
For decades, the RPC abstraction has been known to be fraught with serious problems related to partial failure, latency, and concurrency. Still, many developers continue to use RPC—some are even developing and open-sourcing new RPC systems—
The evolutionary path of microprocessor design includes both multicore and many-core architectures. Harnessing the most computing throughput from these architectures requires concurrent or parallel execution of instru...
详细信息
The evolutionary path of microprocessor design includes both multicore and many-core architectures. Harnessing the most computing throughput from these architectures requires concurrent or parallel execution of instructions. The authors describe the challenges facing the industry as parallel-computing platforms become even more widely available.
Within the computing continuum, SBCs (single-board computers) are essential in the Edge and Fog, with many featuring multiple processing cores and GPU accelerators. In this way, parallel computing plays a crucial role...
详细信息
Within the computing continuum, SBCs (single-board computers) are essential in the Edge and Fog, with many featuring multiple processing cores and GPU accelerators. In this way, parallel computing plays a crucial role in enabling the full computational potential of SBCs. However, selecting the best-suited solution in this context is inherently complex due to the intricate interplay between PPI (parallel programming interface) strategies, SBC architectural characteristics, and application characteristics and constraints. To our knowledge, no solution presents a combined discussion of these three aspects. To tackle this problem, this article aims to provide a benchmark of the best-suited parallelism PPIs given a set of hardware and application characteristics and requirements. Compared to existing benchmarks, we introduce new metrics, additional applications, various parallelism interfaces, and extra hardware devices. Therefore, our contributions are the methodology to benchmark parallelism on SBCs and the characterization of the best-performing parallelism PPIs and strategies for given situations. We are confident that parallel computing will be mainstream to process edge and fog computing;thus, our solution provides the first insights regarding what kind of application and parallel programming interface is the most suited for a particular SBC hardware.
Novel interconnect technologies offer solutions to on-chip communication scalability problems. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency and energy-...
详细信息
Novel interconnect technologies offer solutions to on-chip communication scalability problems. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency and energy-efficient broadcast even in large-scale chip multiprocessors. It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches to significantly improve the performance, energy efficiency, scalability, and programmability of many-core chips.
Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance especially in hybrid systems using accelerators. Processor-cores on the same socket are able to commu...
详细信息
Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance especially in hybrid systems using accelerators. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contains wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost is typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional steps in the parallel computation and higher use of on-chip communications. This tradeoff is explored using a performance model. An implementation using the reverse-acceleration programming model on the petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in communication performance exists. (C) 2011 Elsevier B.V. All rights reserved.
The last improvements in programming languages and models have focused on simplicity and abstraction;leading Python to the top of the list of the programming languages. However, there is still room for improvement whe...
详细信息
The last improvements in programming languages and models have focused on simplicity and abstraction;leading Python to the top of the list of the programming languages. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and contains one single annotation (in the form of a Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. The evaluation demonstrates that AutoParallel goes one step further in easing the development of distributed applications. On the one hand, the programmability evaluation highlights the benefits of using a single Python decorator instead of manually annotating each task and its parameters or, even worse, having to develop the parallel code explicitly (e.g., using OpenMP, MPI). On the other hand, the performance evaluation demonstrates that AutoParallel is capable of automatically generating task-based workflows from sequential Python code while achieving the same performances than manually taskified versions of established state-of-the-art algorithms (i.e., Cholesky, LU, and QR decompositions). Finally, AutoParallel is also capable of automatically building data blocks to increase the tasks' granularity;freeing the user from creating the data chunks, and re-designing the algorithm. For advanced users, we believe that this feature can be useful as a baseline to design blocked algorithms.
We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portabi...
详细信息
We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portability reasons and takes a C-program with OpenMP directives and produces a C-program for POSIX threads. We describe some of the ideas behind the design of OdinMP/CCp and show some performance results achieved on an SGI Origin 2000 and a Sun E10000, Speedup measurements relative to a sequential version of the test programs show that OpenMP programs using OdinMP/CCp exhibit excellent performance on the Sun E10000 and reasonable performance on the Origin 2000, Copyright (C) 2000 John Whey & Sons, Ltd.
暂无评论