Very often many parameterized trials of a simulation are required. Modifying parameters of a simulation will often only modify a fraction of the work. this paper proposes time scale combining (TSC) to combine multiple...
详细信息
Very often many parameterized trials of a simulation are required. Modifying parameters of a simulation will often only modify a fraction of the work. this paper proposes time scale combining (TSC) to combine multiple independent simulation trials by sharing common events and inter-processor communication messages across distinct simulations. As a result, some of the overhead experienced in parallel computation can be amortized across these multiple simulations, reducing the total time required to execute all of the simulations. the results from an experimental evaluation on both asynchronous and synchronous parallelarchitectures are presented.
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the gr...
详细信息
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points than others. We introduce a new block decomposition method, Fair Binary Recursive Decomposition (FBRD), which is suitable for a collection of heterogeneous processors, and extend it to accommodate non-uniform problems (NUFBRD). Mathematical comparisons of the NUFBRD method and other common partitioning schemes are presented to show the expected performance level of this new decomposition technique.
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) mac...
详细信息
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) machines since the number of computations can be increased to match the available number of processors. Performance tuning can be particularly difficult for these applications since it must often be performed with a smaller problem size than that targeted for eventual execution. this research develops a performance prediction methodology that addresses this problem through symbolic analysis of program source code. Algebraic manipulations can then be performed on the resulting analytical model to determine performance for scaled up applications on different hardware architectures.
the proceedings contains 112 papers. Topics discussed include computer systems, telecommunication systems, parallelprocessing systems, information technology, real time systems, digital signal processing, multimedia ...
详细信息
the proceedings contains 112 papers. Topics discussed include computer systems, telecommunication systems, parallelprocessing systems, information technology, real time systems, digital signal processing, multimedia communications, fuzzy sets, neural networks, image processing, software engineering, logic circuits, pattern recognition and concurrent engineering.
Performance debugging and prediction for parallel systems is a difficult problem. the difficulties in identifying performance bottlenecks stem from the need for an intimate understanding of the underlying architecture...
详细信息
Performance debugging and prediction for parallel systems is a difficult problem. the difficulties in identifying performance bottlenecks stem from the need for an intimate understanding of the underlying architecture. It has been recognized that portability is an important requirement for parallel program development. However, this makes the task of performance debugging even more difficult. In this paper, we present a simulation based approach for performance prediction of portable parallel programs. We demonstrate it using Charm: a message driven programming environment, which provides program portability across a variety of shared and distributed memory MIMD parallel systems. the proposed approach makes it possible to use a single debugging environment for the development of portable parallel software. this environment can provide correctness and performance debugging support that provides the developer with valuable feedback for improving program performance.
We describe a parallel data communication scheme for processors using free-space optics. A matrix-addressed SLM (Spatial Light Modulator) is used as an optical switch, through which different modules in a processor co...
详细信息
We describe a parallel data communication scheme for processors using free-space optics. A matrix-addressed SLM (Spatial Light Modulator) is used as an optical switch, through which different modules in a processor communicate. the aim is to achieve better performance through improved communications, taking advantage of the inherent parallelism and non-interference of light. through the use of microprogramming, parallel assembly programming is supported, producing more efficient microcode. the communication scheme is extended to a multiprocessor system, yielding an MIMD architecture with a reconfigurable network. the scheme is studied as a parallel universal host, capable of supporting different virtual architectures.
this paper presents a study of the workspace and kinematic properties of four different architectures of six-degree-of-freedom parallel mechanisms. For each architecture, the volume of the Cartesian workspace is compu...
详细信息
the problem of bicriterion scheduling of jobs with identical processing times by uniform processors is considered. the first criterion is the minimization of either total or maximum costs, the second one is the minimi...
详细信息
the problem of bicriterion scheduling of jobs with identical processing times by uniform processors is considered. the first criterion is the minimization of either total or maximum costs, the second one is the minimization of maximum cost with different cost functions. Polynomial time algorithms are presented to determine all efficient solutions and the optimal solution for a given global criterion.
In this paper, a basic architecture for efficient massively-parallelprocessing is discussed. In order to construct general-purpose massively parallelprocessing systems, efficient and close interaction between proces...
详细信息
In this paper, a basic architecture for efficient massively-parallelprocessing is discussed. In order to construct general-purpose massively parallelprocessing systems, efficient and close interaction between processing elements is the most critical issue. We propose a complementary processor architecture with two different processing elements which are optimized to different grain sizes (fine-grain and coarse-grain). the proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexible fine-grained operation such as distributed shared memory, versatile synchronization and message communications. After detailed discussion, we describe architecture of the prototype machine (JUMP-1).
暂无评论