A practical implementation of high performance instruction level parallelarchitectures is constrained by the difficulty to build a large monolithic multi-ported register file (RF). A solution is to partition the RF i...
详细信息
A practical implementation of high performance instruction level parallelarchitectures is constrained by the difficulty to build a large monolithic multi-ported register file (RF). A solution is to partition the RF into smaller RFs while keeping the total number of registers and ports equal. This paper applies RF partitioning to transport triggered architectures (TTAs); these architectures are of the VLIW type. One may expect that partitioning increases the number of executed cycles because it constrains the number of ports per RF. It is shown that these performance losses are small; e.g. partitioning an RF with 24 registers and four read and four write ports into four RFs with 6 registers and one read and one write port gives a performance loss of only 5.8%. Partitioned RFs consume less area than monolithic RFs with the same number of ports and registers. Experiments show that, if the area saved by partitioning is spent on extra registers, partitioning does, on average, not reduce the performance; it may even result in a small performance gain.
One objective in establishing our NSF ILI funded parallel computation laboratory was to use closed, formal laboratory assignments to introduce parallelism throughout the core computer science curriculum. We discuss la...
Software pipelining is a technique that reforms the loop to improve execution time. Iterations are executed in overlapped fashion to increase parallelism. Modulo scheduling places each operation so that the schedule i...
详细信息
Software pipelining is a technique that reforms the loop to improve execution time. Iterations are executed in overlapped fashion to increase parallelism. Modulo scheduling places each operation so that the schedule is legal when replicated and offset by a target initiation interval. This process is repeated with larger initiation intervals until success is achieved. Kernel recognition methods schedule operations as rapidly as possible until a pattern is recognized. These two distinctly different methods have various strengths and weaknesses. This paper explores the benefits and draw-backs of each.
The problem of detecting the weak visibility of an n-vertex simple polygon P is that of finding whether P is weakly visible from one of its edges and (if it is) identifying every edge from which P is weakly visible. I...
详细信息
The problem of detecting the weak visibility of an n-vertex simple polygon P is that of finding whether P is weakly visible from one of its edges and (if it is) identifying every edge from which P is weakly visible. In this paper, we present an optimal parallel algorithm for solving this problem. Our algorithm runs in O (log n) time using O (n/log n) processors in the CREW PRAM computational model, and is very different from the sequential algorithms for this problem. Based on this algorithm, several other problems related to weak visibility can be optimally solved in parallel.
暂无评论