In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods ...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
Domain algebras are proposed as a tool for structuring compiler correctness proofs which are based on denotational semantics of the source and target language. The correctness of a compiler for a small imperative lang...
详细信息
This paper describes the principles underlying an efficient implementation of a lazy functional language, compiling to code for ordinary computers. It is based on combinator-like graph reduction: the user defined func...
详细信息
This paper describes the principles underlying an efficient implementation of a lazy functional language, compiling to code for ordinary computers. It is based on combinator-like graph reduction: the user defined functions are used as rewrite rules in the graph. Each function is compiled into an instruction sequence for an abstract graph reduction machine, called the G-machine, the code reduces a function application graph to its value. The G-machine instructions are then translated into target code. Speed improvements by almost two orders of magnitude over previous lazy evaluators have been measured;we provide some performance figures.
In recent years, deep neural network (DNN) has been frequently used for classification. In this study, iris flowers having 3 different types are classified by using DNN which are utilized the width and length of petal...
详细信息
Access control is a vital security mechanism in today's operating systems and the security policies dictating the security relevant behaviors is lengthy and complex for example in Security-Enhanced Linux (SELinux)...
详细信息
Mixin-based inheritance is an inheritance technique that has been shown to subsume a variety of different inheritance mechanisms. It is based directly upon an incremental modification model of inheritance. This paper ...
详细信息
Emerging multi-core processors promise to provide an exponentially increasing number of hardware threads with every generation. Applications will need to be highly concurrent to fully use the power of these processors...
详细信息
The potential of heterogeneous multicores, like the Cell BE, can only be exploited if the host and the accelerator cores are used in parallel and if the specific features of the cores are considered. Parallel programm...
详细信息
An important aspect of system support for mobile computing involves alleviating the issues related to programming the underlying distributed system. Our approach to dealing with these issues is by means of programming...
详细信息
暂无评论