distributed systems are expected to support mobile computations executed over a computer network of fixed and mobile hosts. The authors examine the requirements for structuring such mobile computations that access sha...
详细信息
The paper describes a number of distributed approaches to implementing a parallel visibility algorithm for viewshed analysis. The problem can be simplified by considering a range of domain partitioning strategies for ...
详细信息
The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make the...
详细信息
The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make them available to everyone interested. This infrastructure has the potential for solving parallel supercomputing applications involving thousands of cooperating components. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributedcomputing embodied in languages such as Java. We developed a prototype of a global computing infrastructure, called SuperWeb, that consists of hosts, brokers and clients. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Client computations are then mapped by the broker onto the registered resources. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.
advances in communications technology, particularly in fibre optics, have accelerated the application of parallel programming by enabling the interconnection of a huge number of high-performance processors. Further ad...
详细信息
ISBN:
(纸本)0780318722
advances in communications technology, particularly in fibre optics, have accelerated the application of parallel programming by enabling the interconnection of a huge number of high-performance processors. Further advances in this field will also enable a low-cost implementation of process and user interface required for plant automation and remote data acquisition. At the same time it will considerably help implementation of highly reliable distributed multicomputer systems. Recently, a number of valuable applications have been reported concerning the parallel real-time computing that helped to control laboratory experiments, moving vehicles, power plants, working robots etc. In the paper, a state-of-the-art survey of the techniques and tools for parallel processing in real-time applications, will be given.
Providing efficient Functions as a Service (FaaS) is challenging due to the serverless programming model and highly heterogeneous and dynamic workloads. Great strides have been made in optimizing FaaS performance thro...
详细信息
ISBN:
(纸本)9798400701559
Providing efficient Functions as a Service (FaaS) is challenging due to the serverless programming model and highly heterogeneous and dynamic workloads. Great strides have been made in optimizing FaaS performance through scheduling, caching, virtualization, and other resource management techniques. The combination of these advances and growing FaaS workloads have pushed the performance bottleneck into the control plane itself. Current FaaS control planes like OpenWhisk introduce 100s of milliseconds of latency overhead, and are becoming unsuitable for high performance FaaS research and deployments. We present the design and implementation of Iluvatar, a fast, modular, extensible FaaS control plane which reduces the latency overhead by more than two orders of magnitude. Iluvatar has a worker-centric architecture and introduces a new function queue technique for managing function scheduling and overcommitment. Iluvatar is implemented in Rust in about 13,000 lines of code, and introduces only 3ms of latency overhead under a wide range of loads, which is more than 2 orders of magnitude lower than OpenWhisk.
Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s.t. some objective function is optimized. High-quality partitions are important for many applications, whose objective...
详细信息
ISBN:
(纸本)9781424416943
Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s.t. some objective function is optimized. High-quality partitions are important for many applications, whose objective functions are often NP-hard to optimize. Most state-of-the-art graph partitioning libraries use a variant of the Kernighan-Lin (KL) heuristic within a multilevel framework. While these libraries are very fast, their solutions do not always meet all user requirements. Moreover, due to its sequential nature, KL is not easy to parallelize. its use as a load balancer in parallel numerical applications therefore requires complicated adaptations. That is why we developed previously an inherently parallel algorithm, called BUBBLE-FOS/C [H. Meyerhenke, B. Monien, S. Schamberger, Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid, in: proceedings of the 20th IEEE International parallel and distributed Processing Symposium, IPDPS'06, IEEE Computer Society,.2006, p. 57 (CD)], which optimizes partition shapes by a diffusive mechanism. However, it is too slow for practical use, despite its high solution quality. In this paper, besides proving that BUBBLE-FOS/C converges towards a local optimum of a potential function, we develop a much faster method for the improvement of partitionings. This faster method called TRUNCCONS is based on a different diffusive process, which is restricted to local areas of the graph and also contains a high degree of parallelism. By coupling TRUNCCONS with BUBBLE-FOS/C in a multilevel framework based on two different hierarchy construction methods, we obtain our new graph partitioning heuristic DIBAP. Compared to BUBBLE-FOS/C, DIBAP shows a considerable acceleration, while retaining the positive properties of the slower algorithm. Experiments with popular benchmark graphs show that DIBAP computes consistently better results than the state-of-the-art libraries METIS and JOSTLE. Moreover, with our
Quantum computing addresses the construction and operation of quantum computers to solve more efficiently instances of specific problems that are difficult to tackle with classical computers. Even if we are currently ...
详细信息
ISBN:
(纸本)9798400704130
Quantum computing addresses the construction and operation of quantum computers to solve more efficiently instances of specific problems that are difficult to tackle with classical computers. Even if we are currently in the so-called Noisy Intermediate Scale Quantum (NISQ), steady signs of progress are being made towards the realization of a fast and reliable quantum computer, materializing the basic building blocks of quantum circuits, i.e., quantum bits and gates. On the other hand, quantum communications cover the transmission of quantum states across distances. Recent advances in this context have led to the novel research area of quantum networking, which is set to define the programming interfaces and protocols for the practical operation of quantum communication and computing infrastructures. The tutorial has the objective of raising awareness about these emerging topics, i.e., quantum computing and quantum networking, in the research community by i) introducing briefly the latest technologies developed in each, then ii) providing hands-on examples of how to use them for simple use cases, and iii) finally sketching the more promising open research challenges.
A distributed processing System is a collection of heterogeneous processors which requires systematic assignment of a set of "m" tasks T = {t(1), t(2) ... t(m)} of a program to a set of "n" process...
详细信息
ISBN:
(数字)9788132204879
ISBN:
(纸本)9788132204862
A distributed processing System is a collection of heterogeneous processors which requires systematic assignment of a set of "m" tasks T = {t(1), t(2) ... t(m)} of a program to a set of "n" processors P = {p(1), p2 ... p(n)}, (where, m >> n) to achieve the efficient utilization of available processor's capacity. If this step is not performed properly, an increase in the number of processors may actually result in a decrease in the total system throughput. The Inter-Task Communication (ITC) time is always the most costly and the least reliable factor in distributed processing environment. This paper deals a heuristic task allocation model which performs the proper allocation of task to most suitable processor to get an optimal solution. A fuzzy membership functions is developed for making the clusters of tasks with the constraints to maximize the throughput and minimize the parallel execution time of the system.
Synthetic aperture radar (SAR)-based platforms have to process increasingly large number of complex floating-point operations and have to meet hard real-time deadlines. However, real-time use of SAR is severely restri...
详细信息
ISBN:
(纸本)9789811024719;9789811024702
Synthetic aperture radar (SAR)-based platforms have to process increasingly large number of complex floating-point operations and have to meet hard real-time deadlines. However, real-time use of SAR is severely restricted by computation time taken for image formation. One of the classical methods of reducing this computation time to make it suitable for real-time application is multi-processing. A successful attempt has been made by the authors to develop and test a parallel algorithm for synthetic aperture radar image formation, and the results are presented in this paper.
Large clustered computers provide low-cost compute cycles, and therefore have promoted the development of sophisticated parallel-programming algorithms based on the Message Passing Interface. Storage platforms, howeve...
详细信息
ISBN:
(纸本)088986568X
Large clustered computers provide low-cost compute cycles, and therefore have promoted the development of sophisticated parallel-programming algorithms based on the Message Passing Interface. Storage platforms, however, fall to keep pace with similar advances. This paper compares standard 4X InfiniBand (IB) to 10-Gigabit Ethernet (GbE) for Use as a common storage infrastructure in addition to message passing. Considering IB's native ability to accelerate protocol processing in hardware, the Ethernet hardware in this study provided similar acceleration using TCP Offload Engines. We evaluated their I/O perfon-nance using the IOZONE benchmark on the iSCSI-based TerraGRID parallel filesystem. Our evaluations show that 10GbE, with or without protocol-offload, offered better throughput and latency than IB to socket-based applications. Although protocol-offload in both 10GbE and IB demonstrated significant improvement in I/O performance, large amount of CPU are still being consumed to handle the associated data-copies and interrupts. The emerging RDMA technologies hold promises to remove the remaining CPU overhead. We plan to continue our study to research the applications of RDMA in parallel I/O.
暂无评论