In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and managemen...
详细信息
In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.
Hadoop is a reasonable tool for cloud computing in big data and Map Reduce paradigm may be a highly successful programming model for large-scale data-intensive computing ***,traditional Hadoop and Map Reduce have been...
详细信息
Hadoop is a reasonable tool for cloud computing in big data and Map Reduce paradigm may be a highly successful programming model for large-scale data-intensive computing ***,traditional Hadoop and Map Reduce have been deployed over local or tightly-coupled cloud resources with one data *** paper focuses on the issue of Hadoop application across multiple data centers.A hierarchical distributed computing architecture of Hadoop is designed and *** job submitted by user can be decomposed automatically into several subtasks which are then allocated and executed on corresponding cluster by location-aware *** presentation of the workflow shows the operating principle of this architecture.
We are well into the era of dataintensive-digital scientific discovery, an era defined by Jim Gray as the Fourth Paradigm. From my own perspective of the life sciences, much has been accomplished, but there is much t...
详细信息
We are well into the era of dataintensive-digital scientific discovery, an era defined by Jim Gray as the Fourth Paradigm. From my own perspective of the life sciences, much has been accomplished, but there is much to do if we are to maximize our understanding of biological systems given the data we have today, let alone what is coming. In my 2010 Jim Gray eScience Award Lecture, I gave my own thoughts on what needs to be accomplished, and with an additional year of hindsight, I expand on that here. Copyright (C) 2012 John Wiley & Sons, Ltd.
Function-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful applications have been migrated to FaaS platforms due to their ease of deployment, scalability, and minimal management overhe...
详细信息
Function-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful applications have been migrated to FaaS platforms due to their ease of deployment, scalability, and minimal management overhead. However, failures in FaaS have not been thoroughly investigated, thus making these desirable platforms unreliable for guaranteeing function execution and ensuring performance requirements. In this paper, we propose Canary, a highly resilient and fault-tolerant framework for FaaS that mitigates the impact of failures and reduces the overhead of function restart. Canary utilizes replicated container runtimes and application-level checkpoints to reduce application recovery time over FaaS platforms. Our evaluations using representative stateful FaaS applications show that Canary reduces the application recovery time and dollar cost by up to 83% and 12%, respectively over the default retry-based strategy. Moreover, it improves application availability with an additional average execution time and cost overhead of 14% and 8%, respectively, as compared to the ideal failure-free execution.
This paper presents SCALANYTICS, a declarative platform that supports high-performance application layer analysis of network traffic. SCALANYTICS uses (1) stateful network packet processing techniques for extracting a...
详细信息
ISBN:
(纸本)9781450319102
This paper presents SCALANYTICS, a declarative platform that supports high-performance application layer analysis of network traffic. SCALANYTICS uses (1) stateful network packet processing techniques for extracting application-layer data from network packets, (2) a declarative rule-based language called ANALOG for compactly specifying analysis pipelines from reusable modules, and (3) a task-stealing architecture for processing network packets at high throughput within these pipelines, by leveraging multi-core processing capabilities in a load-balanced manner without the need for explicit performance profiling. We have developed a prototype of SCALANYTICS that enhances a declarative networking engine with support for ANALOG and various stateful components, integrated with a parallel task-stealing execution model. We evaluate our SCALANYTICS prototype on a wide range of pipelines for analyzing SMTP and SIP traffic, and for detecting malicious traffic flows. Our evaluation on a 16-core machine demonstrate that SCALANYTICS achieves up to 11.4× improvement in throughput compared with the best uniprocessor implementation. Moreover, SCALANYTICS outperforms the Bro intrusion detection system by an order of magnitude when used for analyzing SMTP traffic.
With the world moving to web-based tools for everything from photo sharing to research publication, it's no wonder scientists are now seeking online technologies to support their research. But the requirements of ...
详细信息
ISBN:
(纸本)9781450316026
With the world moving to web-based tools for everything from photo sharing to research publication, it's no wonder scientists are now seeking online technologies to support their research. But the requirements of large-scale computational research are both unique and daunting: massive data, complex software, limited budgets, and demand for increased collaboration. While "the cloud" promises to alleviate some of these pressures, concerns about feasibility still exist for scientists and the resource providers that support *** panel will explore the capacity of Software as a Service (SaaS) to transform computational research so the challenges above can be leveraged to advance, not hinder, innovation and discovery. Leaders from each constituency of a scientific research environment (investigator, campus champion, supercomputing facility, SaaS provider) will debate the feasibility of SaaS-based research, examining the delta between current and desired state from a technology and adoptability perspective. We will explore the delta between where we are -- and where we need to be -- for scientists to reliably and securely perform research in the cloud.
A huge volume of data is produced every day by social networks (e.g. Facebook, Instagram, Whatsapp, etc.), sensors, mobile devices and other applications. Although the Cloud computing scenario has grown rapidly in rec...
详细信息
ISBN:
(纸本)9781538678800
A huge volume of data is produced every day by social networks (e.g. Facebook, Instagram, Whatsapp, etc.), sensors, mobile devices and other applications. Although the Cloud computing scenario has grown rapidly in recent years, it still suffers from a lack of the kind of standardization that involves the resource management for Big data applications, such as the case of MapReduce. In this context, the users face a big challenge in attempting to understand the requirements of the application and how to consolidate the resources properly. This scenario raises significant challenges in the different areas: systems, infrastructure, platforms as well as providing several research opportunities in Big data Analytics. This work proposes the use of hybrid infrastructures such as Cloud and Volunteer computing for Big data processing and analysis. In addition, it provides a data distribution model that improves the resource management of Big data applications in hybrid infrastructures. The results indicate the feasibility of hybrid infrastructures since it supports the reproducibility and predictability of Big data processing by low and high-scale simulation within Hybrid infrastructures.
Distributed data-intensive workflow applications are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made avail...
详细信息
ISBN:
(纸本)9781479913725
Distributed data-intensive workflow applications are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made available as data, SAAS, and IAAS clouds. The execution of distributed data-intensive workflow applications can expose network bottlenecks between clouds that compromise performance. In this paper, we focus on alleviating network bottlenecks by using a proxy network. In particular, we show how proxies can eliminate network bottlenecks by smart routing and perform in-network computations to boost workflow application performance. A novel aspect of our work is the inclusion of multiple proxies to accelerate different workflow stages optimizing different performance metrics. We show that the approach is effective for workflow applications and broadly applicable. Using Montage~1 as an exemplar workflow application, results obtained through experiments on PlanetLab showed how different proxies acting in a variety of roles can accelerate distinct stages of Montage. Our mi-crobenchmarks also show that routing data through select proxies can accelerate network transfer for TCP/UDP bandwidth, delay, and jitter, in general.
暂无评论