检索结果-内蒙古大学图书馆

Building reliable and efficient data transfer and processing pipelines

Building reliable and efficient data transfer and processing...

1st International Workshop on Middleware for Grid computing held in Conjunction with the ACM/IFIP/USINIX Middleware Conference

作者： Kosar, T Kola, G Livny, M Univ Wisconsin Dept Comp Sci Madison WI 53706 USA

Scientific distributed applications have an increasing need to process and move large amounts of data across wide area networks. Existing systems either closely couple computation and data movement, or they require substantial human involvement during the end-to-end process. We propose a framework that enables scientists to build reliable and efficient data transfer and processing pipelines. Our framework provides a universal interface to different data transfer protocols and storage systems. It has sophisticated How control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to replicate and process three terabytes of DPOSS astronomy image dataset and several terabytes of WCER educational video dataset. In both cases, the entire process was performed without any human intervention and the data pipeline recovered automatically from various failures. Copyright (c) 2005 John Wiley & Sons, Ltd.

关键词： workflows data pipelines data transfer data intensive computing distributed systems Grid computing fault tolerance

来源：评论

学校读者我要写书评

暂无评论

Building reliable and efficient data transfer and processing pipelines

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2006年第6期18卷 609-620页

作者： Kosar, T Kola, G Livny, M Univ Wisconsin Dept Comp Sci Madison WI 53706 USA

关键词： workflows data pipelines data transfer data intensive computing distributed systems Grid computing fault tolerance

来源：评论

学校读者我要写书评

暂无评论

A simulation and data analysis system for large-scale, data-driven oil reservoir simulation studies

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2005年第11期17卷 1441-1467页

作者： Kurc, T Catalyurek, U Zhang, X Saltz, J Martino, R Wheeler, M Peszynska, M Sussman, A Hansen, C Sen, M Seifoullaev, R Stoffa, P Torres-Verdin, C Parashar, M Ohio State Univ Biomed Informat Dept Columbus OH 43210 USA Univ Texas Ctr Subsurface Modeling Inst Computat Engn & Sci Austin TX 78712 USA Oregon State Univ Dept Math Corvallis OR 97331 USA Univ Maryland Dept Comp Sci College Pk MD 20742 USA Univ Texas Inst Geophys Austin TX 78758 USA Univ Texas Dept Petr & Geosyst Engn Austin TX 78712 USA Rutgers State Univ Dept Elect & Comp Engn Piscataway NJ 08854 USA

The main goal of oil reservoir management is to provide more efficient, cost-effective and environmentally safer production of oil from reservoirs. Numerical simulations can aid in the design and implementation of optimal production strategies. However, traditional simulation-based approaches to optimizing reservoir management are rapidly overwhelmed by data volume when large numbers of realizations are sought using detailed geologic descriptions. In this paper, we describe a software architecture to facilitate large-scale simulation studies, involving ensembles of long-running simulations and analysis of vast volumes of output data. Copyright (c) 2005 John Wiley & Sons, Ltd.

关键词： oil reservoir simulation Grid computing data intensive computing data management

来源：评论

学校读者我要写书评

暂无评论

Management of grid jobs and data within SAMGrid

Management of grid jobs and data within SAMGrid

引用

IEEE International Conference on Cluster computing

作者： Baranovski, A Garzoglio, G Terekhov, I Roy, A Tannenbaum, T Fermilab Natl Accelerator Lab Batavia IL 60510 USA

ISBN: (纸本)0780386949

When designing SAMGrid, a project for distributing high-energy physics computations on. a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This paper describes how we enhanced Condor G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the data-intensive computing with SAMGrid.

关键词： SAMGrid Condor-G SAM DO grid computing planning middleware data management scheduling data intensive computing

来源：评论

学校读者我要写书评

暂无评论

A high-performance application data environment for large-scale scientific computations

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2003年第12期14卷 1262-1274页

作者： Shen, XH Liao, WK Chouldhary, A Memik, G Kandemir, M Motorola Inc Core Technol Dept Libertyville IL 60048 USA Northwestern Univ Dept Elect & Comp Engn Evanston IL 60208 USA Penn State Univ Dept Comp Sci & Engn University Pk PA 16802 USA

Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that these solutions either require a deep understanding of specific data storage architectures and file layouts to obtain the best performance (as in high-performance storage management systems and parallel file systems), or they sacrifice significant performance in exchange for ease-of-use and portability (as in traditional database management systems). In this paper, we discuss the design, implementation, and evaluation of a novel application development environment for scientific computations. This environment includes a number of components that make it easy for the programmers to code and run their applications without much programming effort and, at the same time, to harness the available computational and storage power on parallel architectures.

关键词： data intensive computing access pattern storage pattern MDMS

来源：评论

学校读者我要写书评

暂无评论

Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation

引用

FUTURE GENERATION COMPUTER SYSTEMS 2003年第4期19卷 533-550页

作者： Yang, G Jin, RM Agrawal, G Ohio State Univ Dept Comp & Informat Sci Columbus OH 43210 USA

With increases in the amount of data available for analysis in commercial settings, on line analytical processing (OLAP) and decision support have become important applications for high performance computing. Implementing such applications on clusters requires a lot of expertise and effort, particularly because of the sizes of input and output datasets. In this paper, we describe our experiences in developing one such application using a cluster middleware, called ADR. We focus on the problem of data cube construction, which commonly arises in multi-dimensional OLAP. We show how ADR, originally developed for scientific data intensive applications, can be used for carrying out an efficient and scalable data cube construction implementation. A particular issue with the use of ADR is tiling of output datasets. We present new algorithms that combine interprocessor communication and tiling within each processor. These algorithms preserve the important properties that are desirable from any parallel data cube construction algorithm. We have carried out a detailed evaluation of our implementation. The main results from our experiments are as follows: (1) high speedups are achieved on both dense and sparse datasets, even though we have used simple algorithms that sequentialize a part of the computation;(2) the execution time depends only upon the amount of computation, and does not increase in a super-linear fashion as the dataset size or the number of tiles increases;and (3) as the datasets become more sparse, sequential performance degrades, but the parallel speedups are still quite good. As part of our on-going work in this area, we are also looking at handling a larger number of dimensions and multi-dimensional partitionings. We describe our preliminary theoretical and experimental work in this direction. (C) 2003 Elsevier Science B.V. All rights reserved.

关键词： data cube construction cluster middleware data intensive computing performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Implementing data cube construction using a cluster middleware: Algorithms, implementation experience, and performance evaluation

Implementing data cube construction using a cluster middlewa...

引用

2nd IEEE/ACM International Symposium on Cluster computing and the Grid

作者： Yang, G Jin, RM Agrawal, G Ohio State Univ Dept Comp & Informat Sci Columbus OH 43210 USA

ISBN: (纸本)0769515827

With increases in the amount of data available for analysis in commercial settings, On Line Analytical Processing (OLAP) and decision support have become important applications for high performance computing. Implementing such applications on clusters requires a lot of expertise and effort, particularly because of the sizes of input and output datasets. In this paper we describe our experiences in developing one such application using a cluster middleware, called ADR. We focus on the problem of data cube construction, which commonly arises in multi-dimensional OLAP. We show how ADR, originally developed for scientific data intensive applications, can be used for carrying out an efficient and scalable data cube construction implementation. A particular issue with the use of ADR is tiling of output datasets. We present new algorithms that combine inter-processor communication and tiling within each processor These algorithms preserve the important properties that are desirable front any parallel data cube construction algorithm. We have carried out a detailed evaluation of our implementation. The main results from our experiments are as follows: 1) High speedups are achieved on both dense and sparse datasets, even though we have used simple algorithms that sequentialize a part of the computation, 2) The execution time depends only upon the amount of computation, and does not increase in a super-linear fashion as the dataset size or the number of tiles increases, and 3) As the datasets become more sparse, sequential performance degrades, but the parallel speedups are still quite good.

关键词： Cluster middleware data cube construction data intensive computing Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：