检索结果-内蒙古大学图书馆

16th IEEE International Conference on Cluster Computing (CLUSTER)

作者： Shirahata, Koichi Sato, Hitoshi Matsuoka, Satoshi Tokyo Inst Technol Tokyo Japan Japan Sci & Technol Agcy CREST Tokyo Japan

ISBN: (纸本)9781479955480

GPUs can accelerate edge scan performance of graph processing applications;however, the capacity of device memory on GPUs limits the size of graph to process, whereas efficient techniques to handle GPU memory overflows, including overflow detection and performance analysis in large-scale systems, are not well investigated. To address the problem, we propose a MapReduce-based out-of-core GPU memory management technique for processing large-scale graph applications on heterogeneous GPU-based supercomputers. Our proposed technique automatically handles memory overflows from GPUs by dynamically dividing graph data into multiple chunks and overlaps CPU-GPU data transfer and computation on GPUs as much as possible. Our experimental results on TSUBAME2.5 using 1024 nodes (12288 CPU cores, 3072 GPUs) exhibit that our GPU-based implementation performs 2.10x faster than running on CPU when graph data size does not fit on GPUs. We also study the performance characteristics of our proposed out-of-core GPU memory management technique, including application's performance and power efficiency of scale-up and scale-out approaches.

关键词： Large-scale Graph Processing GPGPU MapReduce out-of-core algorithms Big Data Applications

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT out-OF-core VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION

AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RA...

引用

IEEE Youth Conference on Information, Computing and Telecommunication

作者： Xue, Jian Lue, Ke Tian, Jie Chinese Acad Sci Grad Univ Coll Comp & Commun Engn Beijing 100864 Peoples R China Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100864 Peoples R China

ISBN: (纸本)9781424450756

Volume rendering techniques have been used widely for high quality visualization of 3D data sets, especially in the fields of biomedical image processing. However, when rendering very large (out-of-core) volume data sets, the conventional in-core volume rendering algorithms cannot run efficiently due to the impossibility of fitting the entire input data in the internal memory of a computer. In order to solve this problem, an efficient out-of-core volume rendering method based on volume ray casting and GPU acceleration, with a new out-of-core framework for visualizing large volume data sets, are proposed in this paper. The new framework gives a transparent and efficient access to the volume data set cached in the hard disk, while the new volume rendering method minimize the times of reloading volume data from the hard disk to the internal memory and perform comparatively fast high-quality volume rendering. The experimental results indicate that the new method and framework are effective and efficient for the visualization of out-of-core medical data sets.

关键词： Biomedical image processing scientific visualization volume rendering out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

out-of-core compression for gigantic polygon meshes 03

Out-of-core compression for gigantic polygon meshes

引用

Annual Symposium of the ACM SIGGRAPH

作者： Isenburg, M Gumhold, S Univ N Carolina Chapel Hill NC 27514 USA Univ Tubingen WSIGRIS D-72074 Tubingen Germany

ISBN: (纸本)9781581137095

Polygonal models acquired with emerging 3D scanning technology or from large scale CAD applications easily reach sizes of several gigabytes and do not fit in the address space of common 32-bit desktop PCs. In this paper we propose an out-of-core mesh compression technique that converts such gigantic meshes into a streamable, highly compressed representation. During decompression only a small portion of the mesh needs to be kept in memory at any time. As full connectivity information is available along the decompression boundaries, this provides seamless mesh access for incremental in-core processing on gigantic meshes. Decompression speeds are CPU-limited and exceed one million vertices and two million triangles per second on a 1.8 GHz Athlon processor. A novel external memory data structure provides our compression engine with transparent access to arbitrary large meshes. This out-of-core mesh was designed to accommodate the access pattern of our region-growing based compressor, which - in return - performs mesh queries as seldom and as local as possible by remembering previous queries as long as needed and by adapting its traversal slightly. The achieved compression rates are state-of-the-art.

关键词： external memory data structures mesh compression out-of-core algorithms streaming meshes processing sequences

来源：评论

学校读者我要写书评

暂无评论

I/O Chunking and Latency Hiding Approach for out-of-core Sorting Acceleration using GPU and Flash NVM 4

I/O Chunking and Latency Hiding Approach for Out-of-core Sor...

引用

4th IEEE International Conference on Big Data (Big Data)

作者： Sato, Hitoshi Mizote, Ryo Matsuoka, Satoshi Ogawa, Hirotaka Natl Inst Adv Ind Sci & Technol Tsukuba Ibaraki Japan Tokyo Inst Technol Tokyo Japan

ISBN: (纸本)9781467390057

We propose an out-of-core sorting acceleration technique, called xtr2sort, that deals with multi-level memory hierarchies of device memory (GPU), host memory (CPU), and semi-external non-volatile memory (Flash NVM) for leveraging the high computational performance and memory bandwidth of GPUs, while offloading bandwidth-oblivious operations onto semi-external memory in order to significantly increasing the memory capacity available for the sort data, well beyond the that of the GPU as well as of the CPU. xtr2sort splits the input records into several chunks to fit in GPU device memory and overlaps (1) I/O operations between semi-external and host memory, (2) data transfers between host and device memory, and (3) sorting on the GPU device in an asynchronous manner for hiding latency. Experimental results show that xtr2sort can sort records up to 256 times larger than is possible with in-core GPU sorting and 16 times larger than is possible with in-core CPU sorting. xtr2sort also achieves 4.39 times faster than out-of-core CPU sorting using 72 threads on 204.8 giga records with int32_t, even though the input records could not fit in the host memory, let alone the GPU device memory. These results indicate that I/O chunking and latency hiding/overlapping maintains sorting performance, despite slow Flash NVM performance, by utilizing GPUs along with good algorithms. Such an approach is viable for accelerating future computing systems with deep memory hierarchies.

关键词： Sorting GPGPU NVM out-of-core algorithms Big Data Applications

来源：评论

学校读者我要写书评

暂无评论

Terrain simplification simplified: A general framework for view-dependent out-of-core visualization

Terrain simplification simplified: A general framework for v...

引用

7th IEEE Symposium on Information Visualization (INFOVIS 2001)

作者： Lindstrom, P Pascucci, V Lawrence Livermore Natl Lab Livermore CA 94551 USA

This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory. Similar to several previously proposed methods for view-dependent refinement, we recursively subdivide a triangle mesh defined over regularly gridded data using longest-edge bisection. As part of this single, per-frame refinement pass, we perform triangle stripping, view frustum culling, and smooth blending of geometry using geomorphing. Meanwhile, our refinement framework supports a large class of error metrics, is highly competitive in terms of rendering performance, and is surprisingly simple to implement. Independent of our refinement algorithm, we also describe several data layout techniques for providing coherent access to the terrain data. By reordering the data in a manner that is more consistent with our recursive access pattern, we show that visualization of gigabyte-size data sets can be realized even on low-end, commodity PCs without the need for complicated and explicit data paging techniques. Rather, by virtue of dramatic improvements in multilevel cache coherence, we rely on the built-in paging mechanisms of the operating system to perform this task. The end result is a straightforward, simple-to-implement, pointerless indexing scheme that dramatically improves the data locality and paging performance over conventional matrix-based layouts.

关键词： terrain visualization surface simplification view-dependent refinement continuous levels of detail edge bisection error metrics geomorphing out-of-core algorithms external memory paging data layouts

来源：评论

学校读者我要写书评

暂无评论

External memory management and simplification of huge meshes

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2003年第4期9卷 525-537页

作者： Cignoni, P Montani, C Rocchini, C Scopigno, R CNR Ist Sci & Tecnol Informaz Area Ric I-56124 Pisa Italy

Very large triangle meshes, i.e., meshes composed of millions of faces, are becoming common in many applications. Obviously, processing, rendering, transmission, and archiving of these meshes are not simple tasks. Mesh simplification and LOD management are a rather mature technology that, in many cases, can efficiently manage complex data. But, only a few available systems can manage meshes characterized by a huge size: RAM size is often a severe bottleneck. In this paper, we present a data structure called Octree-based External Memory Mesh (OEMM). It supports external memory management of complex meshes, loading dynamically in main memory only the selected sections and preserving data consistency during local updates. The functionalities implemented on this data structure (simplification, detail preservation, mesh editing, visualization, and inspection) can be applied to huge triangles meshes on low-cost PC platforms. The time overhead due to the external memory management is affordable. Results of the test of our system on complex meshes are presented.

关键词： out-of-core algorithms hierarchical data structures mesh simplification level of detail 3D scanning texture synthesis

来源：评论

学校读者我要写书评

暂无评论

A multiresolution representation for massive meshes

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2005年第2期11卷 139-148页

作者： Shaffer, E Garland, M Univ Illinois Dept Comp Sci Urbana IL 61801 USA

We present a new external memory multiresolution surface representation for massive polygonal meshes. Previous methods for building such data structures have relied on resampled surface data or employed memory intensive construction algorithms that do not scale well. Our proposed representation combines efficient access to sampled surface data with access to the original surface. The construction algorithm for the surface representation exhibits memory requirements that are insensitive to the size of the input mesh, allowing it to process meshes containing hundreds of millions of polygons. The multiresolution nature of the surface representation has allowed us to develop efficient algorithms for view-dependent rendering, approximate collision detection, and adaptive simplification of massive meshes. The empirical performance of these algorithms demonstrates that the underlying data structure is a powerful and flexible tool for operating on massive geometric data.

关键词： hierarchical data structures level of detail mesh simplification out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Layered point clouds: A simple and efficient multiresolution structure for distributing and rendering gigantic point-sampled models

引用

COMPUTERS & GRAPHICS-UK 2004年第6期28卷 815-826页

作者： Gobbetti, E Marton, F CRS4 Visual Comp Grp I-09010 Pula CA Italy

We recently introduced an efficient multiresolution structure for distributing and rendering very large point sampled models on consumer graphics platforms [1]. The structure is based on a hierarchy of precomputed object-space point clouds, that are combined coarse-to-fine at rendering time to locally adapt sample densities according to the projected size in the image. The progressive block based refinement nature of the rendering traversal exploits on-board caching and object based rendering APIs, hides out-of-core data access latency through speculative prefetching, and lends itself well to incorporate backface, view frustum, and occlusion culling, as well as compression and view-dependent progressive transmission. The resulting system allows rendering of complex out-of-core models at high frame rates (over 60 M rendered points/second), supports network streaming, and is fundamentally simple to implement. We demonstrate the efficiency of the approach on a number of very large models, stored on local disks or accessed through a consumer level broadband network, including a massive 234 M samples isosurface generated by a compressible turbulence simulation and a 167 M samples model of Michelangelo's St. Matthew. Many of the details of our framework were presented in a previous study. We here provide a more thorough exposition, but also significant new material, including the presentation of a higher quality bottom-up construction method and additional qualitative and quantitative results. (C) 2004 Elsevier Ltd. All rights reserved.

关键词： point-based graphics large datasets out-of-core algorithms level of detail

来源：评论

学校读者我要写书评

暂无评论

Using desktop computers to solve large-scale dense linear algebra problems

引用

JOURNAL OF SUPERCOMPUTING 2011年第2期58卷 145-150页

作者： Marques, M. Quintana-Orti, G. Quintana-Orti, E. S. van de Geijn, R. Univ Jaime I Depto Ingn & Ciencia Comp Castellon de La Plana 12071 Spain Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

We provide experimental evidence that current desktop computers feature enough computational power to solve large-scale dense linear algebra problems. While the high computational cost of the numerical methods for solving these problems can be tackled by the multiple cores of current processors, we propose to use the disk to store the large data structures associated with these applications. Our results also show that the limited amount of RAM and the comparatively slow disk of the system pose no problem for the solution of very large dense linear systems and linear least-squares problems. Thus, current desktop computers are revealed as an appealing, cost-effective platform for research groups that have to deal with large dense linear algebra problems but have no direct access to large computing facilities.

关键词： Dense linear algebra out-of-core algorithms LU factorization High-performance computing

来源：评论

学校读者我要写书评

暂无评论

Grouper: A Compact, Streamable Triangle Mesh Data Structure

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014年第1期20卷 84-98页

作者： Luffel, Mark Gurung, Topraj Lindstrom, Peter Rossignac, Jarek Georgia Inst Technol Graph Visualizat & Usabil Ctr GVU Atlanta GA 30308 USA Lawrence Livermore Natl Lab CASC Livermore CA 94551 USA

We present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We present a linear-time construction algorithm that allows streaming out Grouper meshes using a small memory footprint while preserving the initial ordering of vertices. As a part of this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle-i.e., less than the three vertex references stored with each triangle in a conventional indexed mesh format-Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access. We demonstrate the versatility and performance benefits of Grouper using a suite of example meshes and processing kernels.

关键词： Mesh compression mesh data structures random access out-of-core algorithms large meshes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：