检索结果-内蒙古大学图书馆

IEEE International Conference on Cluster Computing (IEEE CLUSTER)

作者： AlOnazi, A. Ltaief, H. Keyes, D. Said, I. Thibault, S. King Abdullah Univ Sci & Technol Extreme Comp Res Ctr Jeddah 23955 Saudi Arabia NVIDIA Oil & Gas Dept Paris France Univ Bordeaux Bordeaux INP LaBRI CNRSInriaUMR 5800 F-33400 Talence France

ISBN: (纸本)9781728147345

We propose a new framework for deploying Reverse Time Migration (RTM) simulations on distributed-memory systems equipped with multiple GPUs. Our software, TB-RTM, infrastructure engine relies on the STARPU dynamic runtime system to orchestrate the asynchronous scheduling of RTM computational tasks on the underlying resources. Besides dealing with the challenging hardware heterogeneity, TB-RTM supports tasks with different workload characteristics, which stress disparate components of the hardware system. RTM is challenging in that it operates intensively at both ends of the memory hierarchy, with compute kernels running at the highest level of the memory system, possibly in GPU main memory, while I/O kernels are saving solution data to fast storage. We consider how to span the wide performance gap between the two extreme ends of the memory system, i.e., GPU memory and fast storage, on which large-scale RTM simulations routinely execute. To maximize hardware occupancy while maintaining high memory bandwidth throughout the memory subsystem, our framework presents the new out-of-core (OOC) feature from STARPU to prefetch data solutions in and out not only from/to the GPU/CPU main memory but also from/to the fast storage system. The OOC technique may trigger opportunities for overlapping expensive data movement with computations. TB-RTM framework addresses this challenging problem of heterogeneity with a systematic approach that is oblivious to the targeted hardware architectures. Our resulting RTM framework can effectively be deployed on massively parallel GPU-based systems, while delivering performance scalability up to 500 GPUs.

关键词： Reverse Time Migration Task-Based Programming Model out-of-core algorithms Asynchronous Executions Overlapping I/O with Computation STARPU OOC

来源：评论

学校读者我要写书评

暂无评论

algorithms for High-Throughput Disk-to-Disk Sorting 13

Algorithms for High-Throughput Disk-to-Disk Sorting

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Sundar, Hari Malhotra, Dhairya Schulz, Karl W. Univ Texas Austin Austin TX 78712 USA Texas Adv Comp Ctr Austin TX 78712 USA

ISBN: (纸本)9781450323789

In this paper, we present a new out-of-core sort algorithm, designed for problems that are too large to fit into the aggregate RAM available on modern supercomputers. We analyze the performance including the cost of TO and demonstrate the fastest (to the best of our knowledge) reported throughput using the canonical sort Benchmark on a general-purpose, production HPC resource running Lustre. By clever use of available storage and a formulation of asynchronous data transfer mechanisms, we are able to almost completely hide the computation (sorting) behind the TO latency. This latency hiding enables us to achieve comparable execution times, including the additional temporary TO required, between a large sort problem (5TB) run as a single, in-RAM sort and our out-of-core approach using 1/10th the amount of RAM. In our largest run, sorting 100TB of records using 1792 hosts, we achieved an end-to-end throughput of 1.24TB/min using our general-purpose sorter, improving on the current Daytona record holder by 65%.

关键词： Sorting out-of-core algorithms Parallel algorithms shared-memory parallelism distributed-memory parallelism hypercube quicksort samplesort asynchronous methods

来源：评论

学校读者我要写书评

暂无评论

Streaming Tetrahedral Mesh Optimization 08

Streaming Tetrahedral Mesh Optimization

引用

ACM Solid and Physical Modeling Symposium (ACM SPM 2008)

作者： Xia, Tian Shaffer, Eric Univ Illinois Chicago IL 60680 USA

ISBN: (纸本)9781605581064

Improving the quality of tetrahedral meshes is an important operation in many scientific computing applications. Meshes with badly shaped elements impact both the accuracy and convergence of scientific applications. State-of-the-art mesh improvement techniques rely on sophisticated numerical optimization methods such as feasible Newton or conjugate gradient. Unfortunately, these methods cannot be practically applied to very large meshes due to their global nature. Our contribution in this paper is to describe a streaming framework for tetrahedral mesh optimization. This framework enables the optimization of meshes an order of magnitude larger than previously feasible, effectively optimizing meshes too large to fit in memory. Our results show that streaming is typically faster than global optimization and results in comparable mesh quality. This leads us to conclude that streaming extends mesh optimization to a new class of mesh sizes without compromising the quality of the optimized mesh.

关键词： Computational geometry and object modeling out-of-core algorithms streaming algorithms mesh smoothing large meshes tetrahedral meshes

来源：评论

学校读者我要写书评

暂无评论

Multi-Threaded Streamline Tracing for Data-Intensive Architectures 4

Multi-Threaded Streamline Tracing for Data-Intensive Archite...

引用

IEEE Symposium on Large Data Analysis and Visualization

作者： Jiang, Ming Van Essen, Brian Harrison, Cyrus Gokhale, Maya Lawrence Livermore Natl Lab Livermore CA 94550 USA

ISBN: (纸本)9781479952151

Streamline tracing is an important tool used in many scientific domains for visualizing and analyzing flow fields. In this work, we examine a shared memory multi-threaded approach to streamline tracing that targets emerging data-intensive architectures. We take an in-depth look at data management strategies for streamline tracing in terms of issues, such as memory latency, bandwidth, and capacity limitations, that are applicable to future HPC platforms. We present two data management strategies for streamline tracing and evaluate their effectiveness for data-intensive architectures with locally attached Flash. We provide a comprehensive evaluation of both strategies by examining the strong and weak scaling implications of a variety of parameters. We also characterize the relationship between I/O concurrency and I/O efficiency to guide the selection of strategy based on use case. From our experiments, we find that using kernel-managed memory-map for out-of-core streamline tracing can outperform optimized user-managed cache.

关键词： streamline tracing memory-map data management out-of-core algorithms data-intensive computing

来源：评论

学校读者我要写书评

暂无评论

Real-time optimal adaptation for planetary geometry and texture: 4-8 tile hierarchies

Real-time optimal adaptation for planetary geometry and text...

引用

IEEE Visualization 2004 Conference

作者： Hwa, LM Duchaineau, MA Joy, KI Lawrence Livermore Natl Lab Livermore CA 94551 USA Univ Calif Davis Inst Data Anal & Visualizat Davis CA 95616 USA Univ Calif Davis Dept Comp Sci Davis CA 95616 USA

The real-time display of huge geometry and imagery databases involves view-dependent approximations, typically through the use of precomputed hierarchies that are selectively refined at runtime. A classic motivating problem is terrain visualization in which planetary databases involving billions of elevation and color values are displayed on PC graphics hardware at high frame rates. This paper introduces a new diamond data structure for the basic selective-refinement processing, which is a streamlined method of representing the well-known hierarchies of right triangles that have enjoyed much success in real-time, view-dependent terrain display. Regular-grid tiles are proposed as the payload data per diamond for both geometry and texture. The use of 4-8 grid refinement and coarsening schemes allows level-of-detail transitions that are twice as gradual as traditional quadtree-based hierarchies, as well as very high-quality low-pass filtering compared to subsampling-based hierarchies. An out-of-core storage organization is introduced based on Sierpinski indices per diamond, along with a tile preprocessing framework based on fine-to-coarse, same-level, and coarse-to-fine gathering operations. To attain optimal frame-to-frame coherence and processing-order priorities, dual split and merge queues are developed similar to the Realtime Optimally Adapting Meshes ( ROAM) Algorithm, as well as an adaptation of the ROAM frustum culling technique. Example applications of lake-detection and procedural terrain generation demonstrate the flexibility of the tile processing framework.

关键词： large data set visualization level-of-detail techniques view dependent visualization adaptive textures out-of-core algorithms procedural terrain generation

来源：评论

学校读者我要写书评

暂无评论

Integrated compiler optimizations for tensor contractions

Integrated compiler optimizations for tensor contractions

引用

作者： Gao, Xiaoyang The Ohio State University

学位级别：Ph.D.

This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute-intensive and consume significant computer resources at national supercomputer centers. The manual development of high-performance parallel programs for them is usually very tedious and time consuming. The TCE system is targeted at reducing the burden on application scientists, by having them specify computations in a high-level form, from which efficient parallel programs are automatically synthesized.@pqdt@break@The goal of this research is to develop an optimization framework to derive high-performance implementations for a set of given tensor contractions. In particular, the issues investigated include: (1) Development of an efficient in-memory parallel algorithm for a tensor contraction: A tensor contraction is essentially a generalized matrix multiplication involving multi-dimensional arrays. A novel parallel tensor contraction algorithm is developed by extending Cannon's memory-efficient parallel matrix multiplication algorithm. (2) Design of a performance-model driven framework for a parallel out-of-core tensor contraction: For a parallel out-of-core tensor contraction, besides the in-core parallel algorithm used, several other factors can affect the overall performance, such as the nested-loop structure (permutation), tile size selection, disk I/O placement and the data partitioning pattern. The best choice here depends on the characteristics of the target machine and the input data. We develop performance models for different parallel out-of-core alternatives and use p

关键词： Computer Science Compiler optimization Loop transformations High-performance computing Parallel algorithms out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Adaptive TetraPuzzles: Efficient out-of-core construction and visualization of gigantic multiresolution polygonal models

引用

ACM TRANSACTIONS ON GRAPHICS 2004年第3期23卷 796-803页

作者： Cignoni, P Ganovelli, F Gobbetti, E Marton, F Ponchio, F Scopigno, R CNR ISTI I-56124 Pisa Italy CRS4 I-09010 Pula Italy

We describe an efficient technique for out-of-core construction and accurate view-dependent visualization of very large surface models. The method uses a regular conformal hierarchy of tetrahedra to spatially partition the model. Each tetrahedral cell contains a precomputed simplified version of the original model, represented using cache coherent indexed strips for fast rendering. The representation is constructed during a fine-to-coarse simplification of the surface contained in diamonds (sets of tetrahedral cells sharing their longest edge). The construction preprocess operates out-of-core and parallelizes nicely. Appropriate boundary constraints are introduced in the simplification to ensure that all conforming selective subdivisions of the tetrahedron hierarchy lead to correctly matching surface patches. For each frame at runtime, the hierarchy is traversed coarse-to-fine to select diamonds of the appropriate resolution given the view parameters. The resulting system can interatively render high quality Views Of out-of-core models of hundreds of millions of triangles at over 40Hz (or 70M triangles/s) on current commodity graphics platforms.

关键词： out-of-core algorithms level of detail

来源：评论

学校读者我要写书评

暂无评论

out-of-core compression for gigantic polygon meshes

引用

ACM TRANSACTIONS ON GRAPHICS 2003年第3期22卷 935-942页

作者： Isenburg, M Gumhold, S Univ N Carolina Chapel Hill NC 27514 USA Univ Tubingen WSIGRIS D-72074 Tubingen Germany

Polygonal models acquired with emerging 3D scanning technology or from large scale CAD applications easily reach sizes of several gigabytes and do not fit in the address space of common 32-bit desktop PCs. In this paper we propose an out-of-core mesh compression technique that converts such gigantic meshes into a streamable, highly compressed representation. During decompression only a small portion of the mesh needs to be kept in memory at any time. As full connectivity information is available along the decompression boundaries, this provides seamless mesh access for incremental in-core processing on gigantic meshes. Decompression speeds are CPU-limited and exceed one million vertices and two million triangles per second on a 1.8 GHz Athlon processor. A novel external memory data structure provides our compression engine with transparent access to arbitrary large meshes. This out-of-core mesh was designed to accommodate the access pattern of our region-growing based compressor, which - in return - performs mesh queries as seldom and as local as possible by remembering previous queries as long as needed and by adapting its traversal slightly. The achieved compression rates are state-of-the-art.

关键词： external memory data structures mesh compression out-of-core algorithms streaming meshes processing sequences

来源：评论

学校读者我要写书评

暂无评论

VOLUME VISUALIZATION FOR out-OF-core 3D IMAGES BASED ON SEMI-ADAPTIVE PARTITIONING

VOLUME VISUALIZATION FOR OUT-OF-CORE 3D IMAGES BASED ON SEMI...

引用

IEEE International Conference on Image Processing

作者： Jian Xue Ke Lu College of Engineering and Information Technology University of the Chinese Academy of Sciences

ISBN: (纸本)9781479983407

Volume rendering techniques have been used widely for high quality visualization of 3D datasets, especially 3D images. However, when rendering very large (out-of-core) datasets, some traditional in-core volume rendering algorithms do not work due to the impossibility of fitting the entire input data in the main memory of a computer. Their simple out-of-core versions do not perform well either because of the slow speed external memory access overhead. In order to solve this problem, a semi-adaptive partitioning strategy and an efficient out-of-core volume rendering method based on it are proposed in this paper. By this new partitioning strategy, the out-of-core dataset is divided into small sub-blocks in different sizes, which are organized by a BSP tree. Each sub-block can be loaded into the fast texture memory of the graphics hardware and be rendered by certain volume rendering algorithm based on 3D texture. Then the final result is obtained by composing the projection images of all the sub-blocks from back to front after traveling the BSP tree according to the viewpoint position. The experimental results indicate that the new method is effective and efficient for the volume visualization of out-of-core 3D images.

关键词： 3D image processing Scientific visualization Volume rendering out-of-core algorithms Data visualisation out-of-core Rendering Visualization Three-Dimensional Images sub-block images partitions

来源：评论

学校读者我要写书评

暂无评论

out-of-core construction and visualization of multiresolution surfaces 03

Out-of-core construction and visualization of multiresolutio...

引用

Proceedings of the 2003 symposium on Interactive 3D graphics

作者： Peter Lindstrom Lawrence Livermore National Laboratory

ISBN: (纸本)9781581136456

We present a method for end-to-end out-of-core simplification and view-dependent visualization of large surfaces. The method consists of three phases: (1) memory insensitive simplification; (2) memory insensitive construction of a multiresolution hierarchy; and (3) run-time, output-sensitive, view-dependent rendering and navigation of the mesh. The first two off-line phases are performed entirely on disk, and use only a small, constant amount of memory, whereas the run-time system pages in only the rendered parts of the mesh in a cache coherent manner. As a result, we are able to process and visualize arbitrarily large meshes given a sufficient amount of disk space; a constant multiple of the size of the input *** to recent work on out-of-core simplification, our memory insensitive method uses vertex clustering on a rectilinear octree grid to coarsen and create a hierarchy for the mesh, and a quadric error metric to choose vertex positions at all levels of resolution. We show how the quadric information can be used to concisely represent vertex position, surface normal, error, and curvature information for anisotropic view-dependent coarsening and silhouette *** run-time component of our system uses asynchronous rendering and view-dependent refinement driven by screen-space error and visibility. The system exploits frame-to-frame coherence and has been designed to allow preemptive refinement at the granularity of individual vertices to support refinement on a time *** results indicate a significant improvement in processing speed over previous methods for out-of-core multiresolution surface construction. Meanwhile, all phases of the method are disk and memory efficient, and are fairly straightforward to implement.

关键词： view-dependent refinement surface simplification out-of-core algorithms large-data visualization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：