A simulation model for a multiprogramming operating system has been devised and programmed in Simscript. Essential elements of the environment have been included such as job arrival rate, maximum number of jobs, the o...
详细信息
Although multithreading can improve performance, it is a source of nondeterminism in application behavior. Existing approaches to replicating multithreaded applications either synchronize replicas at interrupt level, ...
详细信息
ISBN:
(纸本)0769516599
Although multithreading can improve performance, it is a source of nondeterminism in application behavior. Existing approaches to replicating multithreaded applications either synchronize replicas at interrupt level, at the expense of performance, or use a nonpreemptive deterministic scheduler, at the expense of concurrency. This paper presents a loose synchronization algorithm for ensuring deterministic replica behavior while preserving concurrency. The algorithm synchronizes replica threads only on state updates by enforcing an equivalent order of mutex acquisitions across replicas.
When one is concerned with maximizing overall program throughput, task mapping in a heterogeneous computing environment presents the problem of which computing unit(s) is best suited to perform each task. This paper e...
详细信息
When one is concerned with maximizing overall program throughput, task mapping in a heterogeneous computing environment presents the problem of which computing unit(s) is best suited to perform each task. This paper explores the concept that finding a `better' starting point for the static mapping process will provide a better opportunity for success. A starting point based on a computation task graph centroid, similar to that of masses in the gravity system, is derived such that the centroid of the task graph is the mapping starting point. Comparisons based on experimentation are then made using the HP Greedy mapping technique while varying the starting point from beginning, centroid and end of the problem. Results show that the task centroid mapping technique does not increase the complexity of the mapping process but does result in an improved overall program throughput.
This study uses real system measurements to investigate the relationships between loop granularity, parallel loop distribution and barrier wait times, and their impact on the multiprogramming performance of loop paral...
详细信息
This paper presents a study of the errors that arise when queueing network models with exponential service times are used to evaluate the performance of systems containing servers with non-exponential service times. E...
详细信息
With the growing sophistication of computer system technology and the increase in EDP applications, the importance of computer system modeling and capacity planning is readily apparent. This paper presents a short tut...
详细信息
Hardware prefetching and last-level cache (LLC) management are two independent mechanisms to mitigate the growing latency to memory. However, the interaction between LLC management and hardware prefetching has receive...
详细信息
This paper presents the generic program approach to achieving portable high-performance. This approach has three phases. In the first, a generic program, defining a family of semantically-equivalent program variants, ...
详细信息
This paper presents the generic program approach to achieving portable high-performance. This approach has three phases. In the first, a generic program, defining a family of semantically-equivalent program variants, is written. In the second, the generic program is specialized to the variant that performs best on an abstract model of the target computer. In the third, this variant is translated to run on the target computer. The Parallel Memory Hierarchy (PMH) generic model is used to define the abstract models of target computers. Using this approach, a spectrum of solutions is possible. At one end of the spectrum, a simple generic program can be written, with roughly the same difficulty as writing a sequential program, that can be tuned automatically to achieve reasonably good performance on a wide variety of computers. This solution can be refined to give better performance. At the labor-intensive end of the spectrum, an application can be tuned so that it achieves the best possible performance on each of a collection of computers.
We introduce a LISP-like language whose parameter passing mechanism and control primitives allow for the creation and the synchronization of an arbitrary number of concurrent computations. The parameter passing mechan...
详细信息
Methods of software performance monitoring and data reduction, recently implemented at the University of Wisconsin, have been successfully used to provide a meaningful set of information for management decision making...
详细信息
暂无评论