We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular f...
详细信息
ISBN:
(纸本)9781424464425
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular fine grain memory accesses that are typical of many computations on linked lists, trees, and graphs. While the current generation of GPUs provides substantial computational power and extremely high bandwidth memory accesses, they may appear at first to be primarily geared toward streamed, highly dataparallel computations. In this paper, we introduce an optimized multithreaded GPU algorithm for prefix computations through a randomization process that reduces the problem to a large number of fine-grain computations. We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible. Our experimental results show scalability for list sizes ranging from 1M nodes to 256M nodes, and significantly improve on the recently published parallel implementations of list ranking, including implementations on the Cell Processor, the MTA-8, and the NVIDIA GeForce 200 series. They also compare favorably to the performance of the best known CUDA algorithm for the scan operation on the Tesla C1060.
This paper describes our novel work of using GPUs to improve the performance of a homography-based visual servo system. We present our novel implementations of a GPU based Efficient Second-order Minimization (GPU-ESM)...
详细信息
This paper describes our novel work of using GPUs to improve the performance of a homography-based visual servo system. We present our novel implementations of a GPU based Efficient Second-order Minimization (GPU-ESM) algorithm. By utilizing the tremendous parallel processing capability of a GPU, we have obtained significant acceleration over its CPU counterpart. Currently our GPU-ESM algorithm can process a 360×360 pixels tracking area at 145 fps on a NVIDIA GTX295 board and Intel Core i7 920, approximately 30 times faster than a CPU implementation. This speedup substantially improves the realtime performance of our system. System reliability and stability are also greatly enhanced by a GPU based Scale Invariant Feature Transform (SIFT) algorithm, which is used to deal with such cases where ESM tracking failure happens, such as due to large image difference, occlusion and so on. In this paper, translation details of the ESM algorithm from CPU to GPU implementation and novel optimizations are presented. The co-processing model of multiple GPUs and multiple CPU threads is described in this paper. The performance of our GPU accelerated system is evaluated with experimental data.
Continuing improvements in CPU and GPU performances as well as increasing multicore processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hard...
详细信息
Continuing improvements in CPU and GPU performances as well as increasing multicore processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are nontrivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications and at the same time work efficiently on a cluster with distributed graphics cards. In this paper, we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL, which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multiprocessor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture and the basic API, discuss its advantages over previous approaches, and present sample configurations and usage scenarios as well as scalability results.
We propose a geographical visualization to support operators of coastal surveillance systems and decision making analysts to get insights in vessel movements. For a possibly unknown area, they want to know where signi...
详细信息
We propose a geographical visualization to support operators of coastal surveillance systems and decision making analysts to get insights in vessel movements. For a possibly unknown area, they want to know where significant maritime areas, like highways and anchoring zones, are located. We show these features as an overlay, on a map. As source data we use A IS data: Many vessels are currently equipped with advanced GPS devices that frequently sample the state of the vessels and broadcast them. Our visualization is based on density fields that are derived from convolution of the dynamic vessel positions with a kernel. The density fields are shown as illuminated height maps. Combination of two fields, with a large and small kernel provides overview and detail. A large kernel provides an overview of area usage revealing vessel highways. Details of speed variations of individual vessels are shown with a small kernel, highlighting anchoring zones where multiple vessels stop. Besides for maritime applications we expect that this approach is useful for the visualization of moving object data in general.
Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First,...
详细信息
Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First, we describe the three major categories of volume rendering algorithms and confirm through an imaging scientist-guided evaluation that ray-casting is the most acceptable. We describe a thread- and data-parallel implementation of ray-casting that makes it amenable to key architectural trends of three modern commodity parallel architectures: multi-core, GPU, and an upcoming many-core Intel (R) architecture code-named Larrabee. We achieve more than an order of magnitude performance improvement on a number of large 3D medical datasets. We further describe a data compression scheme that significantly reduces data-transfer overhead. This allows our approach to scale well to large numbers of Larrabee cores.
In this paper, we propose a stochastic simulation to model and analyze cellular signal transduction. The high number of objects in a simulation requires advanced visualization techniques: first to handle the large dat...
详细信息
ISBN:
(纸本)9781424444045
In this paper, we propose a stochastic simulation to model and analyze cellular signal transduction. The high number of objects in a simulation requires advanced visualization techniques: first to handle the largedata sets, second to Support the human perception in the crowded environment, and third to provide an interactive exploration tool. To adjust the state of the cell to ail external signal, a specific set of signaling molecules transports the information to the nucleus deep inside the cell. There, key molecules regulate gene expression. Ill contrast to continuous ODE models we model all signaling molecules individually in a more realistic crowded and disordered environment. Beyond spatiotemporal concentration profiles our data describes the process on a mesoscopic, molecular level, allowing a detailed view of intracellular events. In Our proposed schematic visualization individual molecules, their tracks, or reactions can be selected and brought into focus to highlight the signal transduction pathway. Segmentation, depth cues and depth of field are applied to reduce the visual complexity. We also provide a virtual microscope to display images for comparison with wet lab experiments. The method is applied to distinguish different transport modes of MAPK (mitogen-activated protein kinase) signaling molecules in a cell. In addition, we simulate the diffusion of drug molecules through the extracellular space of a solid tumor and visualize the challenges in cancer related therapeutic drug delivery.
Highly distributed systems such as Grids are used today to the execution of large-scale parallel applications. The behavior analysis of these applications is not trivial. The complexity appears because of the event co...
详细信息
ISBN:
(纸本)9781424439355
Highly distributed systems such as Grids are used today to the execution of large-scale parallel applications. The behavior analysis of these applications is not trivial. The complexity appears because of the event correlation among processes, external influences like time-sharing mechanisms and saturation of network links, and also the amount of data that registers the application behavior Almost all visualization tools to analysis of parallel applications offer a space-time representation of the application behavior This paper presents a novel technique that combines traces front grid applications with a treemap visualization of the data. With this combination, we dynamically create an annotated hierarchical structure that represents the application behavior for the selected time interval. The experiments in the grid show that we can readily use our technique to the analysis of large-scale parallel applications with thousands of processes.
Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First,...
详细信息
Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First, we describe the three major categories of volume rendering algorithms and confirm through an imaging scientist-guided evaluation that ray-casting is the most acceptable. We describe a thread- and data-parallel implementation of ray-casting that makes it amenable to key architectural trends of three modern commodity parallel architectures: multi-core, GPU, and an upcoming many-core Intel (R) architecture code-named Larrabee. We achieve more than an order of magnitude performance improvement on a number of large 3D medical datasets. We further describe a data compression scheme that significantly reduces data-transfer overhead. This allows our approach to scale well to large numbers of Larrabee cores.
The proceedings contain 56 papers. The topics discussed include: interactive visual clustering of large collections of trajectories;proximity-based visualization of movement trace data;guided analysis of hurricane tre...
ISBN:
(纸本)9781424452835
The proceedings contain 56 papers. The topics discussed include: interactive visual clustering of large collections of trajectories;proximity-based visualization of movement trace data;guided analysis of hurricane trends using statistical processes integrated with interactive parallel coordinates;finding comparable temporal categorical records: a similarity measure with an interactive visualization;a visual analytics system for radio frequency fingerprinting-based localization;combining automated analysis and visualization techniques for effective exploration of high-dimensional data;two-stage framework for visualization of clustered high dimensional data;model space visualization for multivariate linear trend discovery;parallel tag clouds to explore and analyze faceted text corpora;describing story evolution from dynamic information streams;VAST contest dataset use in education;evaluating visual analytics systems for investigative analysis: deriving design principles from a case study;and visual analysis of graphs with multiple connected components.
Time and streak surfaces are ideal tools to illustrate time-varying vector fields since they directly appeal to the intuition about coherently moving particles. However, efficient generation of high-quality time and s...
详细信息
Time and streak surfaces are ideal tools to illustrate time-varying vector fields since they directly appeal to the intuition about coherently moving particles. However, efficient generation of high-quality time and streak surfaces for complex, large and time-varying vector field data has been elusive due to the computational effort involved. In this work, we propose a novel algorithm for computing such surfaces. Our approach is based on a decoupling of surface advection and surface adaptation and yields improved efficiency over other surface tracking methods, and allows us to leverage inherent parallelization opportunities in the surface advection, resulting in more rapid parallel computation. Moreover, we obtain as a result of our algorithm the entire evolution of a time or streak surface in a compact representation, allowing for interactive, high-quality rendering, visualization and exploration of the evolving surface. Finally, we discuss a number of ways to improve surface depiction through advanced rendering and texturing, while preserving interactivity, and provide a number of examples for real-world datasets and analyze the behavior of our algorithm on them.
暂无评论