In this paper we describe a parallel algorithm using the BSP/CGM model (Bulk Synchronous Parallel/Coarse Grained Multicomputer) to obtain the Euler tours in graphs. It is based on the PRAM (Parallel Random Access Mach...
详细信息
ISBN:
(纸本)0769520464
In this paper we describe a parallel algorithm using the BSP/CGM model (Bulk Synchronous Parallel/Coarse Grained Multicomputer) to obtain the Euler tours in graphs. It is based on the PRAM (Parallel Random Access Machine) algorithm by Caceres et al. For an input graph of n vertices and m edges, the algorithm requires local computation time of O((m + n)/p), O((m + n'p) memory and O(log p) communication rounds, where p is the number of processors. To our knowledge there are no other parallel algorithms under the coarse-grained models for the Euler tours in graphs. the proposed algorithm is implemented using MPI (Message Passing Interface) and the C language. the parallel program runs on a Beowulf with 66 nodes. the implementation results confirm the theoretical complexity results of the algorithm.
this work proposes a set of requirements for programming emerging FPGA-based highperformancecomputing systems, and uses them to evaluate a number of existing parallel programming models.
ISBN:
(纸本)9780769533070
this work proposes a set of requirements for programming emerging FPGA-based highperformancecomputing systems, and uses them to evaluate a number of existing parallel programming models.
In this paper we present a homography algorithm to produce image mosaics using parallelism to solve a multiple Singular Value Decomposition (SVD) system. We analyse four state of art SVD methods and choose the one whi...
详细信息
ISBN:
(纸本)9781467386210
In this paper we present a homography algorithm to produce image mosaics using parallelism to solve a multiple Singular Value Decomposition (SVD) system. We analyse four state of art SVD methods and choose the one which better suites the expected size of the matrices derived from the datasets of interest. then we use cuda to accelerate the solution of the transformation homogeneous matrices.
Serverless computing has emerged as a popular cloud computing paradigm. Serverless environments are convenient to users and efficient for cloud providers. However, they can induce substantial application execution ove...
详细信息
ISBN:
(纸本)9781665476522
Serverless computing has emerged as a popular cloud computing paradigm. Serverless environments are convenient to users and efficient for cloud providers. However, they can induce substantial application execution overheads, especially in applications with many functions. In this paper, we propose to accelerate serverless applications with a novel approach based on software-supported speculative execution of functions. Our proposal is termed Speculative Function-as-a-Service (SpecFaaS). It is inspired by out-of-order execution in modern processors, and is grounded in a characterization analysis of FaaS applications. In SpecFaaS, functions in an application are executed early, speculatively, before their control and data dependences are resolved. Control dependences are predicted like in pipeline branch prediction, and data dependences are speculatively satisfied with memoization. Withthis support, the execution of downstream functions is overlapped withthat of upstream functions, substantially reducing the end-to-end execution time of applications. We prototype SpecFaaS on Apache OpenWhisk, an open-source serverless computing platform. For a set of applications in a warmed-up environment, SpecFaaS attains an average speedup of 4.6x. Further, on average, the application throughput increases by 3.9x and the tail latency decreases by 58.7%.
the necessity for capping carbon emission has significantly restricted the potential of modern data centers. For this matter, both industry and academia are proactively seeking opportunities on cross-layer power manag...
详细信息
ISBN:
(纸本)9781467355872
the necessity for capping carbon emission has significantly restricted the potential of modern data centers. For this matter, both industry and academia are proactively seeking opportunities on cross-layer power management schemes that could open a door for sustainable high-performancecomputing platform. In this paper we investigate an emerging trend in the IT industry: using promising onsite distributed generation (DG) techniques to provide premium clean energy to the computing load. We develop data center power demand shaping (PDS), a novel technique that allows data centers to utilize onsite green energy efficiently. In contrast to prior design, PDS takes advantage of a so-far unexplored power supply feature, i.e., the load following capabilities of DG systems to avoid the highperformance penalty issue incurred during supply tracking. In addition, PDS features two adaptive power management schemes: DGR Boost and UPS Boost. these two workload-aware optimization methods leverage mature computer tuning knobs to achieve attractive data center performance improvement. Using real-world data center traces and industry data of distributed generation systems, we show that our technique can come within 1.2% performance of an ideal oracle, which is roughly a 37% improvement over existing supply tracking based design. Our design could save over 100 metric tons of carbon emissions annually for a 10MW data center.
the poster presents the darkfibre "project architecture" deployed by RENATER to support research projects withhigh network resources requirements. We show maps of the RENATER standard and darkfibre architec...
详细信息
ISBN:
(纸本)1424403073
the poster presents the darkfibre "project architecture" deployed by RENATER to support research projects withhigh network resources requirements. We show maps of the RENATER standard and darkfibre architectures. We summarize requirements and results for projects currently using the architecture (DEISA, LHC, Grid5000).
Achieving highperformance parallel computing requires both a large scale and reliable system. We describe our design and implementation of the Message Passing Interface, called MPICH-OPeN, for parallel computing over...
详细信息
ISBN:
(纸本)1424403073
Achieving highperformance parallel computing requires both a large scale and reliable system. We describe our design and implementation of the Message Passing Interface, called MPICH-OPeN, for parallel computing over a peer-to-peer network to address this challenge. Our implementation uses the Condor standalone checkpoint library and the Chandy-Lamport algorithm, for reliability, with extensions to make it decentralized. We use the OPeN architecture with an adaptive peer-to-peer protocol that caches connections between peers according to communication requirements of the parallel processes. We used PlanetLab to compare the performance of our implementation to MPICH-P4 and to measure the impact of dynamic peers on parallel program execution.
Computational portal developers often re-implement functionality found in existing portals because there are no common discovery and access mechanisms in place to enable sharing of portal functions. the web services a...
详细信息
ISBN:
(纸本)0769516866
Computational portal developers often re-implement functionality found in existing portals because there are no common discovery and access mechanisms in place to enable sharing of portal functions. the web services architecture provides an implementation independent, protocol based mechanism for entities to find, share, and invoke remote services. We developed complementary web services at SDSC and IU that generate batch scripts for different batch queuing schedulers.
We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction se...
详细信息
ISBN:
(纸本)9781450366854
We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction set architecture (QISA) into separate groups and then bias certain qubits representing the group so that only the instructions within the group have a high probability of getting executed in a quantum processor. therefore, the result generated will be the superimposition of the qubits as if all the instructions within the group were executed simultaneously. We show that we can achieve a significant design improvement compared to classical computer.
the paper presents a pragmatic scan partitioning architecturethat allows less than perfect scan design in highperformance, VLSI circuits to cost-effectively achieve test development and manufacturing test goals. the...
详细信息
ISBN:
(纸本)0769515703
the paper presents a pragmatic scan partitioning architecturethat allows less than perfect scan design in highperformance, VLSI circuits to cost-effectively achieve test development and manufacturing test goals. the paper then describes an implementation of the architecture on Compaq's Alpha 21364 microprocessor.
暂无评论