检索结果-内蒙古大学图书馆

Stork data scheduler: mitigating the data bottleneck in e-Science

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES 2011年第1949期369卷 3254-3267页

作者： Kosar, Tevfik Balman, Mehmet Yildirim, Esma Kulasekaran, Sivakumar Ross, Brandon SUNY Buffalo Dept Comp Sci & Engn Buffalo NY 14260 USA Univ Calif Berkeley Lawrence Berkeley Lab Computat Res Div Berkeley CA 94720 USA Louisiana State Univ Ctr Computat & Technol Baton Rouge LA 70803 USA

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

关键词： data-intensive computing input/output scheduling throughput optimization e-Science Stork

来源：评论

学校读者我要写书评

暂无评论

The Study of Hadoop Application across Multiple data Centers

The Study of Hadoop Application across Multiple Data Centers

引用

2015 International Industrial Informatics and Computer Engineering Conference(IIICEC 2015)

作者： Aizhi Wu College of Vehicles and Energy Yanshan University

Hadoop is a reasonable tool for cloud computing in big data and Map Reduce paradigm may be a highly successful programming model for large-scale data-intensive computing ***,traditional Hadoop and Map Reduce have been deployed over local or tightly-coupled cloud resources with one data *** paper focuses on the issue of Hadoop application across multiple data centers.A hierarchical distributed computing architecture of Hadoop is designed and *** job submitted by user can be decomposed automatically into several subtasks which are then allocated and executed on corresponding cluster by location-aware *** presentation of the workflow shows the operating principle of this architecture.

关键词： data-intensive computing data center Hadoop hierarchical distributed computing across multiple clusters

来源：评论

学校读者我要写书评

暂无评论

The reaming of life: based on the 2010 Jim Gray eScience Award Lecture

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2013年第4期25卷 445-453页

作者： Bourne, Philip E. Univ Calif San Diego Skaggs Sch Pharm & Pharmaceut Sci La Jolla CA 92093 USA

We are well into the era of data intensive-digital scientific discovery, an era defined by Jim Gray as the Fourth Paradigm. From my own perspective of the life sciences, much has been accomplished, but there is much to do if we are to maximize our understanding of biological systems given the data we have today, let alone what is coming. In my 2010 Jim Gray eScience Award Lecture, I gave my own thoughts on what needs to be accomplished, and with an additional year of hindsight, I expand on that here. Copyright (C) 2012 John Wiley & Sons, Ltd.

关键词： scholarly communication the Fourth Paradigm data-intensive computing

来源：评论

学校读者我要写书评

暂无评论

Canary: fault-tolerant FaaS for stateful time-sensitive applications 22

Canary: fault-tolerant FaaS for stateful time-sensitive appl...

引用

Proceedings of the International Conference on High Performance computing, Networking, Storage and Analysis

作者： Moiz Arif Kevin Assogba M. Mustafa Rafique Rochester Institute of Technology

Function-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful applications have been migrated to FaaS platforms due to their ease of deployment, scalability, and minimal management overhead. However, failures in FaaS have not been thoroughly investigated, thus making these desirable platforms unreliable for guaranteeing function execution and ensuring performance requirements. In this paper, we propose Canary, a highly resilient and fault-tolerant framework for FaaS that mitigates the impact of failures and reduces the overhead of function restart. Canary utilizes replicated container runtimes and application-level checkpoints to reduce application recovery time over FaaS platforms. Our evaluations using representative stateful FaaS applications show that Canary reduces the application recovery time and dollar cost by up to 83% and 12%, respectively over the default retry-based strategy. Moreover, it improves application availability with an additional average execution time and cost overhead of 14% and 8%, respectively, as compared to the ideal failure-free execution.

关键词： data-intensive computing deep learning serverless computing OpenWhisk data parallelism TensorFlow

来源：评论

学校读者我要写书评

暂无评论

Scalanytics: a declarative multi-core platform for scalable composable traffic analytics 13

Scalanytics: a declarative multi-core platform for scalable ...

引用

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

作者： Harjot Gill Dong Lin Xianglong Han Cam Nguyen Tanveer Gill Boon Thau Loo University of Pennsylvania Philadelphia PA USA

ISBN: (纸本)9781450319102

This paper presents SCALANYTICS, a declarative platform that supports high-performance application layer analysis of network traffic. SCALANYTICS uses (1) stateful network packet processing techniques for extracting application-layer data from network packets, (2) a declarative rule-based language called ANALOG for compactly specifying analysis pipelines from reusable modules, and (3) a task-stealing architecture for processing network packets at high throughput within these pipelines, by leveraging multi-core processing capabilities in a load-balanced manner without the need for explicit performance profiling. We have developed a prototype of SCALANYTICS that enhances a declarative networking engine with support for ANALOG and various stateful components, integrated with a parallel task-stealing execution model. We evaluate our SCALANYTICS prototype on a wide range of pipelines for analyzing SMTP and SIP traffic, and for detecting malicious traffic flows. Our evaluation on a 16-core machine demonstrate that SCALANYTICS achieves up to 11.4× improvement in throughput compared with the best uniprocessor implementation. Moreover, SCALANYTICS outperforms the Bro intrusion detection system by an order of magnitude when used for analyzing SMTP traffic.

关键词： programming languages and environments data-intensive computing applications of parallel and distributed computing

来源：评论

学校读者我要写书评

暂无评论

SaaS for science: the path to reality for research in the cloud 12

SaaS for science: the path to reality for research in the cl...

引用

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond

作者： Ian Foster Vas Vasiliadis The University of Chicago Argonne IL

ISBN: (纸本)9781450316026

With the world moving to web-based tools for everything from photo sharing to research publication, it's no wonder scientists are now seeking online technologies to support their research. But the requirements of large-scale computational research are both unique and daunting: massive data, complex software, limited budgets, and demand for increased collaboration. While "the cloud" promises to alleviate some of these pressures, concerns about feasibility still exist for scientists and the resource providers that support *** panel will explore the capacity of Software as a Service (SaaS) to transform computational research so the challenges above can be leveraged to advance, not hinder, innovation and discovery. Leaders from each constituency of a scientific research environment (investigator, campus champion, supercomputing facility, SaaS provider) will debate the feasibility of SaaS-based research, examining the delta between current and desired state from a technology and adoptability perspective. We will explore the delta between where we are -- and where we need to be -- for scientists to reliably and securely perform research in the cloud.

关键词： data-intensive computing cloud computing SaaS software-as-a-service research data management

来源：评论

学校读者我要写书评

暂无评论

Enabling Strategies for Big data Analytics in Hybrid Infrastructures

Enabling Strategies for Big Data Analytics in Hybrid Infrast...

引用

International Conference on High Performance computing and Simulation

作者： Julio C. S. Anjos Kassiano J. Matteussi Paulo R. R. De Souza Claudio F. R. Geyer Alexandre S. Veith Gilles Fedak Jorge Luis Victoria Barbosa Inst. of Inf. Fed. Univ. of Rio Grande do Sul Porto Alegre Brazil

ISBN: (纸本)9781538678800

A huge volume of data is produced every day by social networks (e.g. Facebook, Instagram, Whatsapp, etc.), sensors, mobile devices and other applications. Although the Cloud computing scenario has grown rapidly in recent years, it still suffers from a lack of the kind of standardization that involves the resource management for Big data applications, such as the case of MapReduce. In this context, the users face a big challenge in attempting to understand the requirements of the application and how to consolidate the resources properly. This scenario raises significant challenges in the different areas: systems, infrastructure, platforms as well as providing several research opportunities in Big data Analytics. This work proposes the use of hybrid infrastructures such as Cloud and Volunteer computing for Big data processing and analysis. In addition, it provides a data distribution model that improves the resource management of Big data applications in hybrid infrastructures. The results indicate the feasibility of hybrid infrastructures since it supports the reproducibility and predictability of Big data processing by low and high-scale simulation within Hybrid infrastructures.

关键词： Big data MapReduce Hybrid Infrastructures Distributed System data-intensive computing

来源：评论

学校读者我要写书评

暂无评论

Accelerating Distributed Workflows With Edge Resources

Accelerating Distributed Workflows With Edge Resources

引用

IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

作者： Siddharth Ramakrishnan Robert Reutiman Abhishek Chandra Jon Weissman Dept. of Computer Science and Engineering University of Minnesota Twin Cities Minneapolis USA

ISBN: (纸本)9781479913725

Distributed data-intensive workflow applications are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made available as data, SAAS, and IAAS clouds. The execution of distributed data-intensive workflow applications can expose network bottlenecks between clouds that compromise performance. In this paper, we focus on alleviating network bottlenecks by using a proxy network. In particular, we show how proxies can eliminate network bottlenecks by smart routing and perform in-network computations to boost workflow application performance. A novel aspect of our work is the inclusion of multiple proxies to accelerate different workflow stages optimizing different performance metrics. We show that the approach is effective for workflow applications and broadly applicable. Using Montage~1 as an exemplar workflow application, results obtained through experiments on PlanetLab showed how different proxies acting in a variety of roles can accelerate distinct stages of Montage. Our mi-crobenchmarks also show that routing data through select proxies can accelerate network transfer for TCP/UDP bandwidth, delay, and jitter, in general.

关键词： Distributed computing workflows network systems data-intensive computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：