检索结果-内蒙古大学图书馆

IEEE Symposium on Large Data Analysis and Visualization

作者： Jiang, Ming Van Essen, Brian Harrison, Cyrus Gokhale, Maya Lawrence Livermore Natl Lab Livermore CA 94550 USA

ISBN: (纸本)9781479952151

Streamline tracing is an important tool used in many scientific domains for visualizing and analyzing flow fields. In this work, we examine a shared memory multi-threaded approach to streamline tracing that targets emerging data-intensive architectures. We take an in-depth look at data management strategies for streamline tracing in terms of issues, such as memory latency, bandwidth, and capacity limitations, that are applicable to future HPC platforms. We present two data management strategies for streamline tracing and evaluate their effectiveness for data-intensive architectures with locally attached Flash. We provide a comprehensive evaluation of both strategies by examining the strong and weak scaling implications of a variety of parameters. We also characterize the relationship between I/O concurrency and I/O efficiency to guide the selection of strategy based on use case. From our experiments, we find that using kernel-managed memory-map for out-of-core streamline tracing can outperform optimized user-managed cache.

关键词： streamline tracing memory-map data management out-of-core algorithms data-intensive computing

来源：评论

学校读者我要写书评

暂无评论

algorithms for High-Throughput Disk-to-Disk Sorting 13

Algorithms for High-Throughput Disk-to-Disk Sorting

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Sundar, Hari Malhotra, Dhairya Schulz, Karl W. Univ Texas Austin Austin TX 78712 USA Texas Adv Comp Ctr Austin TX 78712 USA

ISBN: (纸本)9781450323789

In this paper, we present a new out-of-core sort algorithm, designed for problems that are too large to fit into the aggregate RAM available on modern supercomputers. We analyze the performance including the cost of TO and demonstrate the fastest (to the best of our knowledge) reported throughput using the canonical sort Benchmark on a general-purpose, production HPC resource running Lustre. By clever use of available storage and a formulation of asynchronous data transfer mechanisms, we are able to almost completely hide the computation (sorting) behind the TO latency. This latency hiding enables us to achieve comparable execution times, including the additional temporary TO required, between a large sort problem (5TB) run as a single, in-RAM sort and our out-of-core approach using 1/10th the amount of RAM. In our largest run, sorting 100TB of records using 1792 hosts, we achieved an end-to-end throughput of 1.24TB/min using our general-purpose sorter, improving on the current Daytona record holder by 65%.

关键词： Sorting out-of-core algorithms Parallel algorithms shared-memory parallelism distributed-memory parallelism hypercube quicksort samplesort asynchronous methods

来源：评论

学校读者我要写书评

暂无评论

Using desktop computers to solve large-scale dense linear algebra problems

引用

JOURNAL OF SUPERCOMPUTING 2011年第2期58卷 145-150页

作者： Marques, M. Quintana-Orti, G. Quintana-Orti, E. S. van de Geijn, R. Univ Jaime I Depto Ingn & Ciencia Comp Castellon de La Plana 12071 Spain Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

We provide experimental evidence that current desktop computers feature enough computational power to solve large-scale dense linear algebra problems. While the high computational cost of the numerical methods for solving these problems can be tackled by the multiple cores of current processors, we propose to use the disk to store the large data structures associated with these applications. Our results also show that the limited amount of RAM and the comparatively slow disk of the system pose no problem for the solution of very large dense linear systems and linear least-squares problems. Thus, current desktop computers are revealed as an appealing, cost-effective platform for research groups that have to deal with large dense linear algebra problems but have no direct access to large computing facilities.

关键词： Dense linear algebra out-of-core algorithms LU factorization High-performance computing

来源：评论

学校读者我要写书评

暂无评论

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2006年第5期66卷 659-673页

作者： Krishnan, S Krishnamoorthy, S Baumgartner, G Lam, CC Ramanujam, J Sadayappan, P Choppella, V Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Louisiana State Univ Dept Comp Sci Baton Rouge LA 70803 USA Louisiana State Univ Dept Elect & Comp Engn Baton Rouge LA USA Indian Inst Informat Technol & Management Thiruvananthapuram 695581 Kerala India

We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions arising in quantum chemistry computations. These loops operate on arrays too large to fit in physical memory. The problem involves determining optimal tiling of loops and placement of disk I/O statements. This entails a search in an explosively large parameter space. We formulate the problem as a nonlinear optimization problem and use a discrete constraint solver to generate optimized out-of-core code. The solution generated using the discrete constraint solver consistently outperforms other approaches by up to a factor of four. Measurements on sequential and parallel versions of the generated code demonstrate the effectiveness of the approach. (c) 2005 Published by Elsevier Inc.

关键词： data locality optimization out-of-core algorithms program transformation compiler optimization discrete constrained search tensor contractions

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT out-OF-core VOLUME RENDERING METHOD BASED ON RAY CASTING AND GPU ACCELERATION

AN EFFICIENT OUT-OF-CORE VOLUME RENDERING METHOD BASED ON RA...

引用

IEEE Youth Conference on Information, Computing and Telecommunication

作者： Xue, Jian Lue, Ke Tian, Jie Chinese Acad Sci Grad Univ Coll Comp & Commun Engn Beijing 100864 Peoples R China Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100864 Peoples R China

ISBN: (纸本)9781424450756

Volume rendering techniques have been used widely for high quality visualization of 3D data sets, especially in the fields of biomedical image processing. However, when rendering very large (out-of-core) volume data sets, the conventional in-core volume rendering algorithms cannot run efficiently due to the impossibility of fitting the entire input data in the internal memory of a computer. In order to solve this problem, an efficient out-of-core volume rendering method based on volume ray casting and GPU acceleration, with a new out-of-core framework for visualizing large volume data sets, are proposed in this paper. The new framework gives a transparent and efficient access to the volume data set cached in the hard disk, while the new volume rendering method minimize the times of reloading volume data from the hard disk to the internal memory and perform comparatively fast high-quality volume rendering. The experimental results indicate that the new method and framework are effective and efficient for the visualization of out-of-core medical data sets.

关键词： Biomedical image processing scientific visualization volume rendering out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

out-OF-core IMPLEMENTATIONS OF CHOLESKY FACTORIZATION: LOOP-BASED VERSUS RECURSIVE algorithms

引用

SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS 2008年第4期30卷 1302-1319页

作者： Bereux, Natacha Ecole Polytech CNRS CMAP F-91128 Palaiseau France

We compare, in the same framework, out-of-core implementations of the Cholesky factorization algorithm. The candidate implementations are the classical blocked left-looking variant and a more recent recursive formulation. Both have been implemented for real positive definite matrices: the former in the parallel out-of-core linear algebra package (POOCLAPACK) library and the latter in the scalable out-of-core linear algebra computations (SOLAR) library. We perform a theoretical analysis of the amount of input/output (I/O) operations required by each variant. We consider alternatives for the left-looking algorithm: the one-tile and two-tiles approaches. We show that when main memory is restricted, the one-tile approach yields less I/O volume. We then show that the left-looking implementation requires less I/O volume than the recursive variant. We have implemented all for complex matrices, and we report on numerical experiments.

关键词： Cholesky factorization out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Scalable rendering of massive triangle meshes on light field displays

引用

COMPUTERS & GRAPHICS-UK 2008年第1期32卷 55-64页

作者： Bettio, Fabio Gobbetti, Enrico Marton, Fabio Pintore, Giovanni POLARIS CRS4 I-09010 Pula CA Italy

We report on a multiresolution rendering system driving light field displays based on a specially arranged array of projectors and a holographic screen. The system gives multiple freely moving naked-eye viewers the illusion of seeing and manipulating 3D objects with continuous viewer-independent parallax. Multiresolution techniques which take into account the displayed light field geometry are employed to dynamically adapt model resolution to display capabilities and timing constraints. The approach is demonstrated on two different scales: a desktop PC driving a 7.4 Mbeams TV-size display, and a cluster-parallel solution driving a large (1.6 x 0.9 m) 35Mbeams display which supports a room-size working space. In both cases, massive meshes of tens of millions of triangles are manipulated at interactive rates. (C) 2008 Elsevier Ltd. All rights reserved.

关键词： 3D displays massive datasets out-of-core algorithms parallel graphics level of detail

来源：评论

学校读者我要写书评

暂无评论

Streaming Tetrahedral Mesh Optimization 08

Streaming Tetrahedral Mesh Optimization

引用

ACM Solid and Physical Modeling Symposium (ACM SPM 2008)

作者： Xia, Tian Shaffer, Eric Univ Illinois Chicago IL 60680 USA

ISBN: (纸本)9781605581064

Improving the quality of tetrahedral meshes is an important operation in many scientific computing applications. Meshes with badly shaped elements impact both the accuracy and convergence of scientific applications. State-of-the-art mesh improvement techniques rely on sophisticated numerical optimization methods such as feasible Newton or conjugate gradient. Unfortunately, these methods cannot be practically applied to very large meshes due to their global nature. Our contribution in this paper is to describe a streaming framework for tetrahedral mesh optimization. This framework enables the optimization of meshes an order of magnitude larger than previously feasible, effectively optimizing meshes too large to fit in memory. Our results show that streaming is typically faster than global optimization and results in comparable mesh quality. This leads us to conclude that streaming extends mesh optimization to a new class of mesh sizes without compromising the quality of the optimized mesh.

关键词： Computational geometry and object modeling out-of-core algorithms streaming algorithms mesh smoothing large meshes tetrahedral meshes

来源：评论

学校读者我要写书评

暂无评论

Integrated compiler optimizations for tensor contractions

Integrated compiler optimizations for tensor contractions

引用

作者： Gao, Xiaoyang The Ohio State University

学位级别：Ph.D.

This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute-intensive and consume significant computer resources at national supercomputer centers. The manual development of high-performance parallel programs for them is usually very tedious and time consuming. The TCE system is targeted at reducing the burden on application scientists, by having them specify computations in a high-level form, from which efficient parallel programs are automatically synthesized.@pqdt@break@The goal of this research is to develop an optimization framework to derive high-performance implementations for a set of given tensor contractions. In particular, the issues investigated include: (1) Development of an efficient in-memory parallel algorithm for a tensor contraction: A tensor contraction is essentially a generalized matrix multiplication involving multi-dimensional arrays. A novel parallel tensor contraction algorithm is developed by extending Cannon's memory-efficient parallel matrix multiplication algorithm. (2) Design of a performance-model driven framework for a parallel out-of-core tensor contraction: For a parallel out-of-core tensor contraction, besides the in-core parallel algorithm used, several other factors can affect the overall performance, such as the nested-loop structure (permutation), tile size selection, disk I/O placement and the data partitioning pattern. The best choice here depends on the characteristics of the target machine and the input data. We develop performance models for different parallel out-of-core alternatives and use p

关键词： Computer Science Compiler optimization Loop transformations High-performance computing Parallel algorithms out-of-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Streaming simplification of tetrahedral meshes

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2007年第1期13卷 145-155页

作者： Vo, Huy T. Callahan, Steven P. Lindstrom, Peter Pascucci, Valerio Silva, Claudio T. Sci Comp & Imaging Inst Salt Lake City UT 84112 USA Lawrence Livermore Natl Lab Ctr Appl Sci Comp Livermore CA 94551 USA Univ Utah Sch Comp Salt Lake City UT 84112 USA

Unstructured tetrahedral meshes are commonly used in scientific computing to represent scalar, vector, and tensor fields in three dimensions. Visualization of these meshes can be difficult to perform interactively due to their size and complexity. By reducing the size of the data, we can accomplish real-time visualization necessary for scientific analysis. We propose a two-step approach for streaming simplification of large tetrahedral meshes. Our algorithm arranges the data on disk in a streaming, I/O-efficient format that allows coherent access to the tetrahedral cells. A quadric-based simplification is sequentially performed on small portions of the mesh in-core. Our output is a coherent streaming mesh which facilitates future processing. Our technique is fast, produces high quality approximations, and operates out-of-core to process meshes too large for main memory.

关键词： computational geometry and object modeling out-of-core algorithms streaming algorithms mesh simplification large meshes tetrahedral meshes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：