As the Web sites provide more complex and critical services, which are business-oriented services, it is necessary to support not only quantity of service but also quality of service (QoS) in the Web server. Also, the...
详细信息
ISBN:
(纸本)1892512459
As the Web sites provide more complex and critical services, which are business-oriented services, it is necessary to support not only quantity of service but also quality of service (QoS) in the Web server. Also, the need of differentiated classes of users and services is growing larger than ever. To enable differentiated services in the Web server the mechanism for delivering end-to-end QoS is needed. We focus on solutions for providing QoS support in cluster platforms with multiple server nodes that host a single Web site. In the paper, we present a new server partition mechanism (AdaptivePart) for differentiated service in Web server cluster. Our method enables the Web server to guarantee Service Level Agreements (SLAs) for different classes of users and Web services. Through the simulations using realistic workload models, we demonstrate that the proposed scheme is able to guarantee an SLA of users with higher priority class, especially when the system is heavily loaded.
Scheduling streaming applications in Data Stream Management Systems (DSMS) has been investigated for years. However, there lacks an intelligent system that is capable of monitoring application execution, modelling its...
详细信息
ISBN:
(纸本)9781538621295
Scheduling streaming applications in Data Stream Management Systems (DSMS) has been investigated for years. However, there lacks an intelligent system that is capable of monitoring application execution, modelling its resource usages, and then adjusting the scheduling plan under different sizes of inputs without requiring users' intervention. In this paper, we model the scheduling problem as a bin-packing variant and propose a heuristic-based algorithm to solve it with minimised inter-node communication. We also implement the D-Storm prototype to validate the efficacy and efficiency of our scheduling algorithm, by extending the Apache Storm framework into a self-adaptive MAPE (Monitoring, Analysis, Planning, Execution) architecture. The evaluation carried out on both synthetic and realistic applications proves that D-Storm outperforms the existing resource-aware scheduler and the default Storm scheduler by at least 16.25% in terms of the inter-node traffic reduction and yields a significant amount of resource savings through consolidation.
Given a discrete optimization problem that is solved by an exhaustive depth first tree search, a method of solving the optimization problem in parallel is to perform a preliminary bounded search to seed a work queue t...
详细信息
ISBN:
(纸本)1932415599
Given a discrete optimization problem that is solved by an exhaustive depth first tree search, a method of solving the optimization problem in parallel is to perform a preliminary bounded search to seed a work queue that is then divided among multiple processors to complete the search of the sub-trees in the work queue. If pruning decisions are made during the search with knowledge of the best-seen solution at the time of the pruning decision, then work scheduling can affect performance of the program based on the distribution of solutions in terms of optimality. We show that a randomized work-scheduling algorithm generally provides performance better than half way between the performances of scheduling algorithms that implement a left to right traversal of the search tree or a right to left traversal. Often performance of the random work scheduling is near the better of the two traditional methods and can actually outperform both.
The extreme scale, complexity and performance variability of future high performance computing systems pose many new challenges to parallel programming models and runtime systems. The Open Community Runtime (OCR) is a...
详细信息
ISBN:
(纸本)9781467387767
The extreme scale, complexity and performance variability of future high performance computing systems pose many new challenges to parallel programming models and runtime systems. The Open Community Runtime (OCR) is a recent effort for a task-based runtime system for extreme scale parallel systems. We have implemented the OCR specification in a shared-memory environment on top of TBB, providing an alternative to the implementation created by the OCR consortium. We have created an experimental extension that supports parallel accelerators programmed with OpenCL. We also have an implementation that targets distributed-memory systems. Despite being in an early stage of development, our implementations can achieve reasonable performance with some applications. We describe the main aspects of our OCR implementations and report on early experimental results on shared-memory and distributed-memory systems.
With the increase in the complexity and number of nodes in large-scale high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly. As the computational ...
详细信息
ISBN:
(纸本)9780769561493
With the increase in the complexity and number of nodes in large-scale high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly. As the computational demands of applications that execute on HPC systems increase, projections indicate that applications executing on exascale-sized systems are likely to operate with a mean time between failures (MTBF) of as little as a few minutes. A number of strategies for enabling fault resilience in systems of extreme sizes have been proposed in recent years. However, few studies provide performance comparisons for these resilience techniques. This work provides a comparison of four state-of-the-art HPC resilience techniques that are being considered for use in exascale systems. We explore the behavior of each resilience technique under simulated execution of a diverse set of applications varying in communication behavior and memory use. We examine how each resilience technique behaves as application size scales from what is considered large today through to exascale-sized applications. We further study the performance degradation that a large-scale system experiences from the overhead associated with each resilience technique as well as the application computation needed to continue execution when a failure occurs. Using the results from these analyses, we examine how application performance on exascale systems can be improved by allowing the system to select the optimal resilience technique for use in an application-specific manner, depending upon each application's execution characteristics.
The Cell Broadband Engine is a high performance multicore processor with superb performance on certain types of problems. However, it does not perform as well running other algorithms, particularly those with heavy br...
详细信息
ISBN:
(纸本)1601320841
The Cell Broadband Engine is a high performance multicore processor with superb performance on certain types of problems. However, it does not perform as well running other algorithms, particularly those with heavy branching. The Intel Xeon processor is a high performance superscalar processor. It utilizes a high clock speed and deep pipelines to help it achieve superior performance, but deep pipelines can perform poorly with frequent memory accesses. This paper is a study and attempt at quantifying the types of programmatic structures that are more suitable to a particular architecture. It focuses on the issues of pipelines, memory access and branching on these two microprocessor architectures.
Due to the ever-increasing computational demand of automotive applications, and in particular autonomous driving functionalities, the automotive industry and supply vendors are starling to adopt parallel and heterogen...
详细信息
ISBN:
(纸本)9781728165820
Due to the ever-increasing computational demand of automotive applications, and in particular autonomous driving functionalities, the automotive industry and supply vendors are starling to adopt parallel and heterogeneous embedded platforms for their products. However, C and C++, the currently dominating programming languages in this industry, do not provide sufficient mechanisms to target such platforms. Established parallel programming models such as OpenMP and OpenCI, on the other hand are tailored towards HPC systems. In this case study, we investigate the applicability of established parallel programming models to automotive workloads on heterogeneous platforms. We pursue a practical approach by re-enacting a typical development process for typical embedded platforms and representative benchmarks.
In order to maintain load balancing in distributed system, we should obtain workload information from all the nodes on network. It requires O(v2) communication overhead, where v is the number of nodes. In this paper, ...
详细信息
ISBN:
(纸本)1892512416
In order to maintain load balancing in distributed system, we should obtain workload information from all the nodes on network. It requires O(v2) communication overhead, where v is the number of nodes. In this paper, we present a new synchronous dynamic distributed load balancing algorithm on a (v, k + 1,1)-configured network applying symmetric balanced incomplete block design, where v = k2+k+1. Our algorithm needs only O(v√v) message overhead and each node receives workload information from all the nodes without redundancy. Therefore, load balancing is maintained since every link has the same amount of traffic for transferring workload information.
Many important parallelapplications require multiple flows of control to run on a single processor In this paper, we present a study of four flow-of-control mechanisms: processes, kernel threads, user-level threads a...
详细信息
ISBN:
(纸本)0769526373
Many important parallelapplications require multiple flows of control to run on a single processor In this paper, we present a study of four flow-of-control mechanisms: processes, kernel threads, user-level threads and event-driven objects. Through experiments, we demonstrate the practical performance and limitations of these techniques on a variety of platforms. We also examine migration of these flows-of-control with focus on thread migration, which is critical for application-independent dynamic load balancing in parallel computing applications. Thread migration, however, is challenging due to the complexity of both user and system state involved. In this paper we present several techniques to support migratable threads and compare the performance of these techniques.
Using parallel Geographic Image processing System, the flooding disaster will be monitoring and evaluating in time. Using ParGIP to establish background database and process RS images, we can get the losses of the dis...
详细信息
ISBN:
(纸本)0780378407
Using parallel Geographic Image processing System, the flooding disaster will be monitoring and evaluating in time. Using ParGIP to establish background database and process RS images, we can get the losses of the disaster by overlaying operation in 24 hours. According to the experiment in the Poyang Lake region, this method can promote the speed and the efficiency of the monitoring and evaluating of flooding disaster to several times.
暂无评论