In this paper, we introduce an easy-to-use, platform senseless task parallel programming model called Lily Task, as well as its implementations on SMP and Cluster. Lily Task programming model reflects directly the hum...
详细信息
ISBN:
(纸本)088986392X
In this paper, we introduce an easy-to-use, platform senseless task parallel programming model called Lily Task, as well as its implementations on SMP and Cluster. Lily Task programming model reflects directly the human thoughts in the parallelism process of a problem. LilyTask supports both data and function parallelisms, as well as dynamic task creation and scheduling. It is very suitable for solving irregular problems. And its performance is satisfying.
The message passing interface (MPI) is a commonly used application programming interface for the development of portable parallel programs. It is easy, however, to create MPI programs that are prone to deadlock. It is...
详细信息
The message passing interface (MPI) is a commonly used application programming interface for the development of portable parallel programs. It is easy, however, to create MPI programs that are prone to deadlock. It is desirable to be able to detect these deadlocks in running programs. It is further desirable to perform this deadlock detection in a distributed manner, without assuming the existence of shared memory for communication. A distributed deadlock detector has been developed that can find deadlocks with a very low overhead and minimal additional communication required among nodes. The detector makes use of the MPI profiling layer, allowing it to be added to a program at link time, requiring no change or recompilation of the user's code. The detector has also been tested on widely varying MPI implementations, demonstrating its portability.
On parallel computer systems which need communications and synchronizations, granularity of parallel execution and data locality are important for performance of a program. The most well-known program transformation w...
详细信息
ISBN:
(纸本)088986392X
On parallel computer systems which need communications and synchronizations, granularity of parallel execution and data locality are important for performance of a program. The most well-known program transformation which improves data locality and finds coarse-grain parallelism is tiling. However, it is a sequential program code that is generated by conventional methods and then the code is further transformed for parallel processing. We propose a tiled code generation method which combines the two phases of conventional methods, tiled code generation and parallelization. We show that the proposed method is convenient to the optimization of DOALL parallel processing for minimizing synchronization.
Program analysis is an important activity to evaluate and subsequently improve the quality of software. Many different visualization tools offer more or less sophisticated functionality for this task. However, the vis...
详细信息
Program analysis is an important activity to evaluate and subsequently improve the quality of software. Many different visualization tools offer more or less sophisticated functionality for this task. However, the visual capabilities of the tool are usually pre-defined by the tool developers' intentions or are only marginally adaptable to the user's needs. On contrary, the VisWiz tool offers a means of providing user-defined visualization for analysis of parallel and distributed programs. By configuring the mapping of observed events and their relations using a XML configuration file, users are able to develop specialized graphical displays, which better suit their expectations and improve program comprehension. Examples of VisWiz are given for debugging, performance tuning, and runtime monitoring of parallel and distributed programs.
Grid coteries are attractive both in message and space complexities, but they are not non-dominated (ND) unfortunately. This paper introduces the transversal merge operation for constructing an ND coterie from a domin...
详细信息
ISBN:
(纸本)088986392X
Grid coteries are attractive both in message and space complexities, but they are not non-dominated (ND) unfortunately. This paper introduces the transversal merge operation for constructing an ND coterie from a dominated one, and applies it to grid coteries to make them ND. The constructed ND grid coteries sill preserve favorite features that the original grid coteries have. To demonstrate this fact, we evaluate their quorum sizes and design a mutual exclusion algorithm based on the ND grid coteries, which allows a process to construct a quorum on the fly.
This paper presents a concept called virtual clusters (VCs) to allocate resources for an application from a computing utility with a geographically distributed resource base. The VC creation process is modeled as a fa...
详细信息
This paper presents a concept called virtual clusters (VCs) to allocate resources for an application from a computing utility with a geographically distributed resource base. The VC creation process is modeled as a facility location problem and an efficient heuristic is devised to solve it. We extend the model to include an "overload partition" to a VC such that demand surges can be efficiently handled. Extensive simulations have been conducted to examine the performance of VCs under different scenarios and to compare it with a fully dynamic scheme called the Service Grid. The results indicate that VC is more cost-effective and robust than Service Grid.
List-based scheduling is generally accepted as an attractive approach to static task scheduling in homogeneous environment, since it pairs low complexity with good results. This paper presents a static list-scheduling...
详细信息
ISBN:
(纸本)088986392X
List-based scheduling is generally accepted as an attractive approach to static task scheduling in homogeneous environment, since it pairs low complexity with good results. This paper presents a static list-scheduling algorithm: Critical Path Parent Trees (CPPT) algorithm. The CPPT algorithm divides the task graph into a set of unlisted parenttrees, the root of each parent-tree is a critical-path node. The analysis and experiments have shown that the algorithm provides very low complexity and comparable results.
In this paper, we show how the use of ontologies has helped us address some key issues facing ubiquitous computing environments;improving interoperability between entities, discovery and matchmaking, and context aware...
详细信息
ISBN:
(纸本)088986392X
In this paper, we show how the use of ontologies has helped us address some key issues facing ubiquitous computing environments;improving interoperability between entities, discovery and matchmaking, and context awareness. Ontologies establish a joint terminology between members of a community of interest and provide a basis for achieving semantic interoperability. We have integrated the use of ontologies in our Smart Spaces framework GAIA [10, 13]. We show that ontologies provide important capabilities for Ubiquitous computing infrastructure, and context-aware computing.
As the complexity of chip designs increase, simulation time also increases. Unit and variable delay simulation takes the most simulation time in IC design process;however, parallel processing performs inefficiently du...
详细信息
As the complexity of chip designs increase, simulation time also increases. Unit and variable delay simulation takes the most simulation time in IC design process;however, parallel processing performs inefficiently due to large amount of synchronization. In this paper, techniques to reduce the number of synchronization points in synchronous designs are proposed, and a partitioner to partition designs along flip-flop boundaries is also proposed so that these techniques can be employed on real designs.
暂无评论