VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruc...
详细信息
ISBN:
(纸本)9780818676413
VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction encoding for VLIWs is defined and a classification scheme for i-fetch hardware for such an encoding is introduced. Several interesting cache and i-fetch organizations are described and evaluated through trace-driven simulations. A new i-fetch mechanism using a silo cache is found to have the best performance.
We address the problem of prefetching and caching in a parallel I/O system and present a new algorithm for optimal parallel-disk scheduling. Traditional buffer management algorithms that minimize the number of I/O dis...
ISBN:
(纸本)9781581134094
We address the problem of prefetching and caching in a parallel I/O system and present a new algorithm for optimal parallel-disk scheduling. Traditional buffer management algorithms that minimize the number of I/O disk accesses, are substantially suboptimal in a parallel I/O system where multiple I/Os can proceed *** present a new algorithm SUPERVISOR for parallel-disk I/O scheduling. We show that in the off-line case, where apriori knowledge of all the requests is available, SUPERVISOR performs the minimum number of I/Os to service the given I/O requests. This is the first parallel I/O scheduling algorithm that is provably offline optimal. In the on-line case, we study SUPERVISOR in the context of global L-block lookahead, which gives the buffer management algorithm a lookahead consisting of L distinct requests. We show that the competitive ratio of SUPERVISOR, with global L-block lookahead, is Θ(M - L + D), when L ≤ M, and Θ(MD/L), when L > M, where the number of disks is D and buffer size is M.
The prevalence of software reference code motivates investigation into efficient implementations of software architectures on field-programmable devices. Modern FPGAs allow designers to generate multi-processor archit...
详细信息
The prevalence of software reference code motivates investigation into efficient implementations of software architectures on field-programmable devices. Modern FPGAs allow designers to generate multi-processor architectures that exactly match the processing needs of the algorithm. This paper describes an architecture supporting the single program multiple data model of parallel processing, and presents results taken from a parallel implementation of the JPEG2000 encoding algorithm and Mandelbrot set generation.
暂无评论