VIZIR provides an integrated mechanism for on-line debugging, performance analysis, and data visualization of message-passing parallelapplications. the current VIZIR includes: (1) a mechanism to multicast window-base...
详细信息
VIZIR provides an integrated mechanism for on-line debugging, performance analysis, and data visualization of message-passing parallelapplications. the current VIZIR includes: (1) a mechanism to multicast window-based commands, message passing and program execution events among tasks; (2) the ability to select groups of tasks to interact these events; (3) a means to report when these events cause errors in the tasks; and (4) ad hoc visualization of distributed arrays using existing visualizers.< >
FASTSPECT is an imaging system designed for dynamic 3-D SPECT imaging of the brain. the system is based on 24 stationary modular cameras with a pinhole aperture structure having between 24 and 150 pinholes. Each camer...
详细信息
FASTSPECT is an imaging system designed for dynamic 3-D SPECT imaging of the brain. the system is based on 24 stationary modular cameras with a pinhole aperture structure having between 24 and 150 pinholes. Each camera is composed of 4 PMTs and a NaI(Tl) crystal. the original system has been replaced with a new gantry designed for use in a clinical environment. this opportunity was taken to add improved imaging capabilities. First, rotation of the aperture is now possible to acquire extra projections for static images. Second, high frame rates are obtainable by reducing dead-time between frames through distributedprocessing by assigning one Inmos T805 Transputer with 16 MB of RAM to each camera. third, photomultiplier signals are now digitized to 8 bits instead of the previously used 5 bits. this additional ability allows different position-estimation schemes, including neural networks. the improved frame-rate capabilities have opened consideration of research in such areas as first-pass ventricular SPECT. Phantom studies have shown that sufficient counts can be collected in 1/20th second to reconstruct a full 3-D ventricular image. the new distributed electronics allows rapid data handling, so cardiac studies at 20 frames/sec are feasible.
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields ...
详细信息
ISBN:
(纸本)0818656026
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields of general relativity, molecular dynamics and QCD. the Paramid is a distributed memory parallel computer, using up to 64 Intel i860-XP processors. the results demonstrate good parallel performance, and the ability of the machine to run standard portable software.
In this paper, we present our experience and results obtained from executing shared memory application programs using fine-grain remote memory access communication and multithreading in the EM-4 multiprocessor. the EM...
详细信息
ISBN:
(纸本)0818656026
In this paper, we present our experience and results obtained from executing shared memory application programs using fine-grain remote memory access communication and multithreading in the EM-4 multiprocessor. the EM-4 is a distributed memory multiprocessor which has a dataflow mechanism.
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields ...
详细信息
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields of general relativity, molecular dynamics and QCD. the Paramid is a distributed memory parallel computer, using up to 64 Intel i860-XP processors. the results demonstrate good parallel performance, and the ability of the machine to run standard portable software.< >
Most studies of processor scheduling in multiprogrammed parallel systems have ignored the I/O performed by applications. Recent studies have demonstrated that significant I/O operations are performed by a number of di...
详细信息
Most studies of processor scheduling in multiprogrammed parallel systems have ignored the I/O performed by applications. Recent studies have demonstrated that significant I/O operations are performed by a number of different classes of parallelapplications. this paper focuses on some basic issues that underlie scheduling in multiprogrammed parallel environments running applications with I/O. Characterization of the I/O behavior of parallelapplications is discussed first. Based on simulation models this research investigates the influence of these I/O characteristics on processor scheduling.< >
How can a user write a program to be portable and efficient across widely different parallel architectures, such as SIMD, MIMD, shared-memory, distributed memory, workstation clusters, etc.? the following issues are c...
详细信息
How can a user write a program to be portable and efficient across widely different parallel architectures, such as SIMD, MIMD, shared-memory, distributed memory, workstation clusters, etc.? the following issues are considered: what language should be used; how appropriate is one language for different applications; how efficient can a portable program be; and how will efficiency be achieved.< >
the utilization of networked, shared, heterogeneous workstations as an inexpensive parallel computational platform is an appealing idea. However, most performance models for parallel computation are oriented towards t...
详细信息
the utilization of networked, shared, heterogeneous workstations as an inexpensive parallel computational platform is an appealing idea. However, most performance models for parallel computation are oriented towards the use of tightly-coupled, dedicated, homogeneous processors. We develop and validate an analytic performance model for synchronous iterative algorithms executing on networked workstations. the model includes the effects of application load, background load, and processor heterogeneity. the model is validated using a pair of applications: a nonlinear optimization code and discrete-event simulation.< >
We present our experience and results obtained from executing shared memory application programs using fine-grain remote memory access communication and multithreading in the EM-4 multiprocessor. the EM-4 is a distrib...
详细信息
We present our experience and results obtained from executing shared memory application programs using fine-grain remote memory access communication and multithreading in the EM-4 multiprocessor. the EM-4 is a distributed memory multiprocessor which has a dataflow mechanism. the dataflow mechanism enables a fine-grain communication packet through the network to invoke the thread of control dynamically with very small overhead and is extended to access remote memory in different processors. We hide the remote memory access latencies with multithreading. the benchmark results show that shared memory applications achieve reasonable speedup with four to eight threads in the EM-4 prototype. We found that aggressive multithreading can negatively affect its network interface and increase the network contention. We also describe the EM-4parallel programming language called EM-C, which provides the notion of a global address space and parallel constructs for exploiting medium-grain parallelism to tolerate several remote operation latencies.< >
We propose a new family of trivalent network graphs with constant node degree 3 for design of massively parallel systems. these graphs are shown to be regular, to have logarithmic diameter in the number of nodes, and ...
详细信息
We propose a new family of trivalent network graphs with constant node degree 3 for design of massively parallel systems. these graphs are shown to be regular, to have logarithmic diameter in the number of nodes, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose simple and optimal routing algorithms. We also show that the proposed graph belongs to the well known family of Cayley graphs.< >
暂无评论