For a specific quantum chip, multi-programming improves overall throughput and resource utilization. Previous studies on mapping multiple programs often lead to resource under-utilization, high error rate, and low fid...
详细信息
ISBN:
(纸本)9781665422352
For a specific quantum chip, multi-programming improves overall throughput and resource utilization. Previous studies on mapping multiple programs often lead to resource under-utilization, high error rate, and low fidelity. this paper proposes QuCloud, a new approach for mapping quantum programs in the cloud environment. We have three new designs in QuCloud. (1) We leverage the community detection technique to partition physical qubits among concurrent quantum programs, avoiding the waste of robust resources. (2) We design X-SWAP scheme that enables inter-program SWAPs and prioritizes SWAPs associated with critical gates to reduce the SWAP overheads. (3) We propose a compilation task scheduler that schedules concurrent quantum programs to be compiled and executed based on estimated fidelity for the best practice. We evaluate our work on publicly available quantum computer IBMQ16 and a simulated quantum chip IBMQ50. Our work outperforms the state-of-the-art work for multi-programming on fidelity and compilation overheads by 9.7% and 11.6%, respectively.
Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. this is...
详细信息
ISBN:
(纸本)1595936734
Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. this is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. this space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment. Copyright 2007 ACM.
the proceedings contain 248 papers. the topics discussed include: parallel computing implementation for real-time image dehazing based on dark channel;improved parallel algorithms for sequential minimal optimization o...
ISBN:
(纸本)9781538666142
the proceedings contain 248 papers. the topics discussed include: parallel computing implementation for real-time image dehazing based on dark channel;improved parallel algorithms for sequential minimal optimization of classification problems;heterogeneous assignment of functional units with Gaussian execution time on a tree;highperformance and low latency vision system with hardware accelerator;merge-based parallel sparse matrix-sparse vector multiplication with a vector architecture;a learning-based adjustment model with genetic algorithm of function point estimation;high-performance implementation of matrix-free Runge-Kutta discontinuous Galerkin method for Euler equations;a step towards hadoop dynamic scaling;and towards building a distributed data management architecture to integrate multi-sources remote sensing big data.
Trace reuse improves the performance of processors by skipping the execution of sequences of redundant instructions. However, many reusable traces do not have all of their inputs ready by the time the reuse test is do...
详细信息
ISBN:
(纸本)0769520464
Trace reuse improves the performance of processors by skipping the execution of sequences of redundant instructions. However, many reusable traces do not have all of their inputs ready by the time the reuse test is done. For these cases, we developed a new technique called Reuse through Speculation on Traces (RST), where trace inputs may be predicted this paper studies the limits of RST for modern processors with deep pipelines, as well as the effects of constraining resources on performance. We show that our approach reuses more traces than the non-speculative trace reuse technique, with speedups of 43% over a non-speculative trace reuse and 57% when memory accesses are reused.
Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. this paper first give...
详细信息
ISBN:
(纸本)9781595937063
Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. this paper first gives the design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing. the carrying out of 64-bit extension design and scientific computing oriented optimization are described in such aspects as instruction set architecture, stream controller, micro controller, ALU cluster, memory hierarchy and interconnection interface here. Second, two kinds of communications as message passing and stream communications are put forward. An interconnection based on the communications is designed for FT64-based highperformancecomputers. third, a novel stream programming language, SF95 (Stream FORTRAN95), and its compiler, SF95Compiler (Stream FORTRAN95 Compiler), are developed to facilitate the development of scientific applications. Finally, nine typical scientific application kernels are tested and the results show the efficiency of stream architecture for scientific computing.
the use of clusters of computers as an environment for highperformancecomputing has been shown to be promising. However, the efficient use of such systems still requires advances that make the application developmen...
详细信息
ISBN:
(纸本)0769520464
the use of clusters of computers as an environment for highperformancecomputing has been shown to be promising. However, the efficient use of such systems still requires advances that make the application development process be simpler and more productive. the development of cluster monitoring tools is essential to achieve this advances. In this paper we present (PMP)-P-2, a tool for use in clusters of personal computers that provides a graphic visualization of the temporal execution of distributed applications that use the MPI standard for message passing. the tool uses an approach involving the parallel port to read the time of events that occur in all different machines of a cluster. It also simulates the execution of task precedence graphs and allocates tasks of a graph to the machines of a cluster, among other functionalities.
Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. this work uses SER to perform constraint partitioning of Constraint Satisfact...
详细信息
ISBN:
(纸本)0769520464
Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. this work uses SER to perform constraint partitioning of Constraint Satisfaction Problems (CSP). In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, this work studies three non-deterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.
the list of applications requiring highperformancecomputing resources is constantly growing. the cost of inter-processor communication is critical in determining the performance of massively parallel computing syste...
Program tracing is one of the most used techniques to debug parallel and distributed programs. In this technique, events are recorded in trace files during the execution of the program for post mortem visualization of...
详细信息
ISBN:
(纸本)0769520464
Program tracing is one of the most used techniques to debug parallel and distributed programs. In this technique, events are recorded in trace files during the execution of the program for post mortem visualization of its behavior this article describes JRastro, a trace agent capable of tracing Java programs. the agent was designed to cover three key features: to be transparent to the application developer to use unmodified Java Virtual Machines and to observe Remote Method Invocations. By integrating these three features, JRastro differentiates itself from similar tools. Unfortunately, for a complete and clean implementation of RMI visualization, additional support on the Java monitoring system is needed.
We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate ...
详细信息
ISBN:
(纸本)0818673982
We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate that the claims made by some researchers that process migration is too heavy to be used to support dynamic load balancing are unsubstantiated. We support our claim by presenting these two mechanisms available in the RHODOS distributed operating system, comparing and contrasting these mechanisms and reporting on their performance.
暂无评论