Over the last decade, much research in the area of scheduling has concentrated on single cluster systems. Less attention has been paid to multicluster systems, although they are gaining more and more importance in pra...
详细信息
In systems consistingof multiple clusters of processors interconnected by relatively slow connections such as our distributed ASCI1 Supercomputer (DAS), jobs may request co-allocation, i.e., the simultaneous allocatio...
详细信息
This paper presents the first steps toward a graph comparison method based on matching matchings, or in other words, comparison of independent edge sets in graphs. The novelty of our approach is to use matchings for c...
详细信息
Garbage collection (GC) pauses are a notorious issue threatening the latency of applications. To mitigate this problem, state-of-the-art concurrent copying collectors allow GC threads to run simultaneously with applic...
详细信息
Exchanging large amounts of floating-point data is common in distributed scientific computing applications. Data compression, when fast enough, can speed up such workloads by reducing the time spent waiting for data t...
详细信息
Exchanging large amounts of floating-point data is common in distributed scientific computing applications. Data compression, when fast enough, can speed up such workloads by reducing the time spent waiting for data transfers. We propose ndzip, a high-throughput, lossless compression algorithm for multi-dimensional univariate regular grids of single- and double-precision floating point data. Tailored towards efficient implementation on modern SIMD-capable multicore processors, it compresses and decompresses data at speeds close to main memory bandwidth, significantly outperforming existing schemes. We evaluate this novel method using a representative set of scientific data, demonstrating a competitive trade-off between compression effectiveness and throughput.
Extraordinary long lifecycles of many scientific applications commonly surpass multiple generations of Grid technologies. Therefore the smooth adaptation and migration to newer environments remains as interesting rese...
详细信息
ISBN:
(纸本)9781424430116
Extraordinary long lifecycles of many scientific applications commonly surpass multiple generations of Grid technologies. Therefore the smooth adaptation and migration to newer environments remains as interesting research question. This paper presents the Otho Toolkit for synthesis of application-specific Grid service wrappers based on specifications of scientific legacy programs. The services are customised and tailor-made for a specific application, service hosting environment and computational infrastructure and include source code for optional manual refinement. We demonstrate its unique combination of advanced features like support for multiple service platforms, parameter sweeping, iterative and parallel programs, progress-reporting, filestaging and security credential management. Moreover our services reliably identify program termination-causes based on programmatically evaluated post-mortem program states. We applied the Otho Toolkit recursively to itself to synthesise a sophisticated Factory service that creates application-specific Grid services on-demand^1.
Compression of floating-point data, both lossy and lossless, is a topic of increasing interest in scientific computing. Developing and evaluating suitable compression algorithms requires representative samples of data...
详细信息
Today, many commercial and private cloud computing providers offer resources for leasing under the infrastructure as a service (IaaS) paradigm. Although an abundance of mechanisms already facilitate the lease and use ...
详细信息
The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workl...
详细信息
The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workloads. In this work we propose BTWorld, a use case for time-based big data analytics that is representative for processing data collected periodically from a global-scale distributed system. BTWorld enables a data-driven approach to understanding the evolution of BitTorrent, a global file-sharing network that has over 100 million users and accounts for a third of today's upstream traffic. We describe for this use case the analyst questions and the structure of a multi-terabyte data set. We design a MapReduce-based logical workflow, which includes three levels of data dependency - inter-query, inter-job, and intra-job - and a query diversity that make the BTWorld use case challenging for today's big data processing tools; the workflow can be instantiated in various ways in the MapReduce stack. Last, we instantiate this complex workflow using Pig-Hadoop-HDFS and evaluate the use case empirically. Our MapReduce use case has challenging features: small (kilobytes) to large (250 MB) data sizes per observed item, excellent (10 -6 ) and very poor (10 2 ) selectivity, and short (seconds) to long (hours) job duration.
暂无评论