On parallelsystems, jobs that request a large fraction of the maximum resources available on the system may incur poor wait time. this paper evaluates whether giving a reservation to every waiting job can improve lar...
详细信息
On parallelsystems, jobs that request a large fraction of the maximum resources available on the system may incur poor wait time. this paper evaluates whether giving a reservation to every waiting job can improve large jobs without significantly degrading the performance of other jobs. Using a wide range of workloads, including more recent workloads than SP2 workloads, and a more complete set of performance measures than in previous studies, we provide new observations of potential benefit and problem of reservation policies that give all jobs a reservation.
As the complexity of chip designs increase, simulation time also increases. Unit and variable delay simulation takes the most simulation time in IC design process;however, parallel processing performs inefficiently du...
详细信息
As the complexity of chip designs increase, simulation time also increases. Unit and variable delay simulation takes the most simulation time in IC design process;however, parallel processing performs inefficiently due to large amount of synchronization. In this paper, techniques to reduce the number of synchronization points in synchronous designs are proposed, and a partitioner to partition designs along flip-flop boundaries is also proposed so that these techniques can be employed on real designs.
this paper describes a general technique to identify control flow errors in parallel programs, which can be automated into a compiler. the compiler builds a system of linear equations that describes the global control...
详细信息
this paper describes a general technique to identify control flow errors in parallel programs, which can be automated into a compiler. the compiler builds a system of linear equations that describes the global control flow of the whole program. Solving these equations using standard techniques of linear algebra can locate a wide range of control flow bugs at compile time. this paper also describes an implementation of this control flow analysis technique in a prototype compiler for a well-known parallel programming language. In contrast to previous research in automated parallel program analysis, our technique is efficient for large programs, and does not limit the range of language features.
A methodology is presented that allows for a distributed execution of systems on several micro controllers and a FPGA (Field Programmable Gate Array). By using a FPGA the system performance can be increased significan...
详细信息
A methodology is presented that allows for a distributed execution of systems on several micro controllers and a FPGA (Field Programmable Gate Array). By using a FPGA the system performance can be increased significantly by means of parallel processing. thereby, hybrid electronic systems are focused on, which contain both state-based and continuous model parts. In order to fulfill real time requirements a real time operating system is used. For the measurement of the system performance a method is presented to analyze the time behavior that enables a graphical representation of the execution time interval and of the execution points in time of the tasks and the recognition of idle running times, and thus supports an optimization of the task scheduling. the data exchange is realized with CAN (Controller Area Network).
Update methods are an important aspect of the burgeoning Artificial Life research area. Artificial Life models, like the Predator-Prey model, are able to operate quite efficiently when implemented in a sequential mann...
详细信息
Update methods are an important aspect of the burgeoning Artificial Life research area. Artificial Life models, like the Predator-Prey model, are able to operate quite efficiently when implemented in a sequential manner only while population numbers are low to moderate. We find that for large populations sequential implementations are too slow to extract meaningful measurement statistics. In this paper we discuss the parallelisation of sequential update methods for use in Artificial Life systems. We also discuss the ramifications that parallel update algorithms introduce to data dependencies and also the meaning of correctness in parallel models.
this paper presents a data format for the parallel numerical integration package PARINT using XML. As with many other numeric computation programs, PARINT accepts a long list of arguments for describing the user's...
详细信息
this paper presents a data format for the parallel numerical integration package PARINT using XML. As with many other numeric computation programs, PARINT accepts a long list of arguments for describing the user's problem, the algorithm to be used and for specifying parallel run characteristics. Supporting XML input allows platform-independent creation and manipulation of input specifications and simplifies the addition of new integration algorithms. We discuss the purpose of each section in the proposed XML data format, and describe how new sections can be added to the XML data structure in order to support new computing paradigms. We also explain how data are processed efficiently and give some application examples. the format can serve more generally for various software packages.
this paper presents coordinated virtual partition (CVP) for Grid computingsystems. the CVP is a way for regulating the resources supplied to different components of an application in unison according to an agreed rel...
详细信息
this paper presents coordinated virtual partition (CVP) for Grid computingsystems. the CVP is a way for regulating the resources supplied to different components of an application in unison according to an agreed relative proportion. this study shows that coordinated resource provisioning has several benefits including: (a) reducing the wait times experienced by an application and (b) improving the overall application performance by reducing the wait times. the CVP achieves these benefits by releasing resources from "fast" running application components that can be reallocated by the Grid for other applications.
We present a novel dynamic on-the-fly race detection mechanism called parallel Nondeterminator to check for determinacy races during the parallel execution of a program with Spawn-Sync parallelism. the parallel Nondet...
详细信息
We present a novel dynamic on-the-fly race detection mechanism called parallel Nondeterminator to check for determinacy races during the parallel execution of a program with Spawn-Sync parallelism. the parallel Nondeterminator provides provable correctness and efficiency. Let D denote the maximum depth of the recursion in the parallel program. the worst case slowdown in execution incurred for each spawn operation is O(D), the overhead for each sync operation is O(1) and the time required to monitor any shared memory access is O(log D). Moreover, we have implemented the parallel Nondeterminator in Cilk, a parallel language developed at MIT. Boththeoretical and experimental results give strong evidences for the efficiency of our algorithm.
the SPACE RIP technique is one of the parallel imaging methods that has the potential to revolutionize the field of fast MR imaging. the image reconstruction problem of SPACE RIP is a computation intensive task which ...
详细信息
the SPACE RIP technique is one of the parallel imaging methods that has the potential to revolutionize the field of fast MR imaging. the image reconstruction problem of SPACE RIP is a computation intensive task which needs to be parallelized to further reduce the reconstruction time. In this paper, we analyzed the algorithm and identified the program bottleneck to be parallelized. the loop level parallelization is implemented with Pthread, OpenMP and MPI. Furthermore, since the reconstruction uses Singular Value decomposition (SVD) to solve the matrix pseudoinverse problem, we implemented the one sided Jacobi parallel SVD on the state-of-art cellular computer architecture Cyclops64 to speedup the problem at the fine grain level.
In this paper, efficient and portable shared memory based parallel computation models for the string matching problem are presented and analyzed for their performances. For exploiting the parallelism in the computatio...
详细信息
In this paper, efficient and portable shared memory based parallel computation models for the string matching problem are presented and analyzed for their performances. For exploiting the parallelism in the computation models, parallel broadcasting method that is a dataflow scheme is applied. thus the models are time and space efficient since they are based on the dataflow mechanism. Several computation models are designed and tested for checking the aspects that affect the parallel programming performance such as granularity, communication, and I/O. For the implementation, Java threads that is a built-in support for the portable parallel programming in the shared memory environment is used. Experimental results demonstrate that the computation models are practical, portable, and scalable parallel solutions to the problem, and the comparative testing reveals facts between the theory and the practice.
暂无评论