We describe a software architecture for storage services in computational grid environments. Based upon a lightweight message-passing paradigm, the architecture enables the provision and composition of active, distrib...
详细信息
ISBN:
(纸本)0769507840
We describe a software architecture for storage services in computational grid environments. Based upon a lightweight message-passing paradigm, the architecture enables the provision and composition of active, distributed storage services. These services can then cooperatively provide access to distributed storage in a manner potentially optimized for dataset and resource environments. We report on the design and implementation of a distributed file system and a dataset-specific satellite imagery service using the architecture. We discuss data movement and storage issues and implications for future work with the architecture.
With the advent of Grid computing, scheduling strategies for distributed heterogeneous systems have either become irrelevant or have to be extended significantly to support Grid dynamics. In this paper, we describe a ...
详细信息
ISBN:
(纸本)0769516866
With the advent of Grid computing, scheduling strategies for distributed heterogeneous systems have either become irrelevant or have to be extended significantly to support Grid dynamics. In this paper, we describe a metascheduling architecture for a Grid system that takes into account both the application and system level considerations. Results are presented to demonstrate the usefulness of the metascheduler.
About ten years ago, we presented the results of an effort to identify the "right metric" for efficient supercomputing at this workshop, The Workshop on high-performance, Power-Aware computing. In this paper...
详细信息
ISBN:
(纸本)9781509036820
About ten years ago, we presented the results of an effort to identify the "right metric" for efficient supercomputing at this workshop, The Workshop on high-performance, Power-Aware computing. In this paper, we review the advances that the community has made in this area of research. The intention of this ten-year retrospective is two-fold: (1) to acknowledge the past work through a historical narrative and (2) to highlight the essence of the remaining issues in this research.
This paper details recent experience in teaching parallel computing concepts to undergraduate computer Science students. By taking a practical approach in delivering the material, students are shown to have grasped es...
详细信息
ISBN:
(纸本)9780769546766
This paper details recent experience in teaching parallel computing concepts to undergraduate computer Science students. By taking a practical approach in delivering the material, students are shown to have grasped essential multi-threading concepts in Java, ensuring they are able to implement the necessary skills themselves. The motivation for parallel computing is clearly demonstrated early in the course, to immediately convince students of the importance in developing their parallel computing skills, should they wish to be effective software developers. Within only 4 weeks, students are able to correctly and efficiently multi-thread a sequential desktop application (with a Graphical User Interface) that is both responsive (does not freeze the user interface) and performant (utilises the underlying multi-core processor). The student evaluations confirm that using live coding demonstrations and analogies were most helpful in learning parallel computing.
The Phylogenetic Likelihood Function (PLF) is an important statistical function for evaluating phylogenetic trees. To this end, the PLF is the computational kernel of all state-of-the-art likelihood-based phylogenetic...
详细信息
ISBN:
(纸本)9780769546766
The Phylogenetic Likelihood Function (PLF) is an important statistical function for evaluating phylogenetic trees. To this end, the PLF is the computational kernel of all state-of-the-art likelihood-based phylogenetic inference programs. Typically, it accounts for more than 85% of total execution time in such programs. We present a substantially improved hardware architecture for computing the PLF based on previous experiences with implementing the PLF on reconfigurable logic. Our new design is optimized for computing the PLF on four-state (DNA) input data. It is also adapted to the computational requirements of real-world tree inference programs and completely independent of the specific tree search algorithm at hand. Furthermore, we describe how our architecture can be modified and adapted to handle general n-state data, such as protein (20 states) or RNA secondary structure data (6, 7, or 16 states, depending on the model). Finally, we designed an interface mechanism such that our PLF hardware architecture can interact with the widely-used phylogenetic inference tool RAxML. We deploy FPGA technology to verify the correctness of the architecture and to evaluate performance.
This paper deals with a novel, distributed, QoS-aware, peer-to-peer checkpointing arrangement component for Mobile Grid (MoG) computing systems middleware. Checkpointing is more crucial in MoG systems than in their wi...
详细信息
ISBN:
(纸本)1595936734
This paper deals with a novel, distributed, QoS-aware, peer-to-peer checkpointing arrangement component for Mobile Grid (MoG) computing systems middleware. Checkpointing is more crucial in MoG systems than in their wired counterparts due to node mobility and less reliable wireless links resulting in frequent and dynamic connections and disconnections. Having determined the globally optimal checkpoint arrangement to be NP-complete, we consider ReD, our Reliability Driven (ReD) protocol, employing QoS-aware heurisitcs, for constucting superior peer-to-peer checkpointing arrangements efficiently.
Ensembles of Online Sequential Extreme Learning Machine algorithm are suitable for forecasting Data Streams with Concept Drifts. Nevertheless, data streams forecasting require high-performance implementations due to t...
详细信息
ISBN:
(纸本)9781538677698
Ensembles of Online Sequential Extreme Learning Machine algorithm are suitable for forecasting Data Streams with Concept Drifts. Nevertheless, data streams forecasting require high-performance implementations due to the high incoming samples rate. In this work, we proposed to tune-up three ensembles, which operates with the Online Sequential Extreme Learning Machine, using high-performance techniques. We reimplemented them in the C programming language with Intel MKL and MPI libraries. The Intel MKL provides functions that explore the multithread features in multicore CPUs, which expands the parallelism to multiprocessors architectures. The MPI allows us to parallelize tasks with distributed memory on several processes, which can be allocated within a single computational node, or spread over several nodes. In summary, our proposal consists of a two-level parallelization, where we allocated each ensemble model into an MPI process, and we parallelized the internal functions of each model in a set of threads through Intel MKL. Thus, the objective of this work is to verify if our proposals provide a significant improvement in execution time when compared to the respective conventional serial approaches. For the experiments, we used a synthetic and a real dataset. Experimental results showed that, in general, the high-performance ensembles improve the execution time, when compared with its serial version, performing up to 10-fold faster.
The proceedings contain 20 papers. The topics discussed include: using hardware transactional memory to enable speculative trace optimization;energy consumption and scalability evaluation for software transactional me...
ISBN:
(纸本)9781467386210
The proceedings contain 20 papers. The topics discussed include: using hardware transactional memory to enable speculative trace optimization;energy consumption and scalability evaluation for software transactional memory on a real computing environment;replicating the performance evaluation of an n-body application on a manycore accelerator;characterizing anomalies of a multicore ARMv7 cluster with parallel N-body simulations;MDACCER: modified distributed assessment of the closeness CEntrality ranking in complex networks for massively parallel environments;intra-clustering: accelerating on-chip communication for data parallel architectures;Kanga: a skeleton-based generic interface for parallel programming;painless parallelism on heterogeneous hardware leveraging the functional paradigm;CHAOS-MCAPI: an optimized mechanism to support multicore parallel programming;and exploiting parallelism in linear algebra kernels through dataflow execution.
Approximate memories provide energy savings or performance improvements at the cost of occasional errors in stored data. Applications that tolerate errors on their data profit from this trade-off by controlling these ...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
Approximate memories provide energy savings or performance improvements at the cost of occasional errors in stored data. Applications that tolerate errors on their data profit from this trade-off by controlling these errors to not affect critical data. This control usually involves programmer intervention with annotations in the source code. To avoid annotations, some techniques protect critical data that are common on many applications, isolating specific memory regions from errors. In this work, we propose and explore alternatives for the protection of application critical data by managing a supervisor execution environment with an approximate memory system. We expose only dynamically allocated data to errors with secure data manipulation through an approximate allocation scheme that divide stored data based on the approximation of the heap area. We evaluate 6 applications with different data access profiles and obtain up to 20% of energy savings.
HMMER3 is biological sequence search suite used in significant volume on systems hosted at the National Energy Research Scientific computing Center. This heavy usage has revealed ways that HMMER3 underutilizes the res...
详细信息
ISBN:
(纸本)9781538655559
HMMER3 is biological sequence search suite used in significant volume on systems hosted at the National Energy Research Scientific computing Center. This heavy usage has revealed ways that HMMER3 underutilizes the resources available in an HPC environment such as the Manycore architecture Knights Landing processors available in the Cori supercomputer. After rigorous performance analysis it was determined that the thread architecture of HMMER3 is the most promising optimization target to increase throughput and efficiency. A refactoring effort introduced an OpenMP task based threading design, the ability to respond to imbalanced computation with work stealing, and input buffering to eliminate a large amount of redundant parsing. These efforts have been implemented and in production on Cori for over a year. In that time they have simplified the best practice for use of HMMER3 in workflows and conserved hundreds of thousands of CPU hours.
暂无评论