Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing oil GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-...
详细信息
ISBN:
(纸本)9781595936288
Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing oil GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs' branching and looping abilities. We introduce three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart. Our optimized implementation achieves 15 - 18 million primary rays per second and 16 - 27 million shadow rays per second oil our test scenes. Our system also takes advantage of GPUs' strengths at rasterization and shading to offer a mode where rasterization replaces eye ray scene intersection, and primary hits and local shading are produced with standard Direct3D code. For 1024x1024 renderings of our scenes with shadows and Phong shading, we achieve 12-18 frames per second. Finally, we investigate the efficiency of our implementation relative to the computational resources of our GPUs and also compare it against conventional CPUs and the Cell processor, which both have been shown to raytrace well.
The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered. Important issues in the architecture and programming environment of connection machines in general are ...
详细信息
The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered. Important issues in the architecture and programming environment of connection machines in general are discussed. These are contrasted with the same issues in MIMD multiprocessors and multicomputers.
作者:
Hu, NanNanyang Med Coll
Publ Teaching Dept Nanyang 473000 Henan Peoples R China Wuhan Univ
Sch Publ Hlth Wuhan 430000 Hubei Peoples R China
In recent years personalized recommendation services have been applied to many areas of society, typically in the fields of e-commerce, short videos and so on. In response to the serious performance problems of the cu...
详细信息
In recent years personalized recommendation services have been applied to many areas of society, typically in the fields of e-commerce, short videos and so on. In response to the serious performance problems of the current online language education platform content recommendation, so in the face of the above opportunities and challenges, this paper designs a new online English education model to allow university students to get a full and more three-dimensional training of English language learning. Based on the MU platform, this paper obtains data from the platform and uses crawler technology to sample and standardize the learning resources for online education. Then user information, such as explicit and implicit ratings of courses, is selected as the main basis for training a user interest preference model. Immediately afterwards, a PRF algorithm combining dataparallelism and task parallelism optimization was executed and implemented on Apeche Spark to provide some optimization of data accuracy and content recommendation methods. Finally, the top-N recommendation rule is used to propose a dynamic evolutionary process of identifying students' preferences or learning habits through the results of previous data analysis, so as to make more accurate course content recommendations and learning content guidance for students' English learning. The online three-dimensional teaching model proposed in this paper focuses more on time-series research than traditional algorithms, and can more accurately capture the dynamic changes in students' learning abilities.
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a stream...
详细信息
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden M...
详细信息
ISBN:
(纸本)9781595930613
The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search large databases. Many current and upcoming architectures offering large amounts of compute power are designed with data-parallel execution and streaming in mind. We present a streaming algorithm for evaluating an HMM's Viterbi probability and refine it for the specific HMM used in biological sequence search. We implement our streaming algorithm in the Brook language, allowing us to execute the algorithm on graphics processors. We demonstrate that this streaming algorithm on graphics processors can outperform available CPU implementations. We also demonstrate this implementation running on a 16 node graphics cluster.
暂无评论