In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters ...
详细信息
ISBN:
(纸本)9781450319102
In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and Stampede and the Chinese Tianhe-1A supercomputers). As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. In this paper, we propose a runtime system that can be integrated with existing cluster resource managers to enable a more efficient use of heterogeneous clusters with GPUs. Differently from previous work, we focus on multi-process GPU applications including synchronization (for example, hybrid MPI-CUDA applications). We discuss the limitations and inefficiencies of existing scheduling and resource sharing schemes in the presence of synchronization. We show that preemption is an effective mechanism to allow efficient scheduling of hybrid MPI-CUDA applications. We validate our runtime on a variety of benchmark programs with different computation and communication patterns.
The new CS curricular recommendations call for a heightened emphasis on parallel and distributed computing (PDC), in response to the explosive growth of multicore processors and "cloud" distributed computing...
详细信息
ISBN:
(纸本)9781450318686
The new CS curricular recommendations call for a heightened emphasis on parallel and distributed computing (PDC), in response to the explosive growth of multicore processors and "cloud" distributed computing. How can an educator incorporate this urgent priority into undergraduate CS courses? This panel describes four approaches: exploring GPU architecture and programming in a Computer Organization course; incorporating shared memory parallelism into several core courses; adding the PDC notion of reduction to multiple CS courses; and inserting short PDC modules into many courses at multiple curricular levels. We will illustrate how these contrasting approaches all respond to PDC recommendations within the feasibility constraint of incrementally modifying individual courses.
Teaching-learning based optimization (TLBO), inspired from the teaching-learning process in a classroom, is a newly developed population based algorithm. Except population size and maximum number of iteration, it does...
详细信息
Teaching-learning based optimization (TLBO), inspired from the teaching-learning process in a classroom, is a newly developed population based algorithm. Except population size and maximum number of iteration, it does not require any specific parameters. TLBO consists of two modes of searching phase, teacher and learner phase. In this paper, every learner is assigned to at least one groups and, instead of a learner studied by interacting directly with other learners, group leader is responsible for raising up the member's knowledge, i.e., to explore for optimal solution. The idea is analog to group discussion in which group leader always dominate group discussion direction and performance. For simplicity, the proposed algorithm will be denoted as LTLBO. The effectiveness of the method is tested on many benchmark problems with different characteristics and the results are compared with original TLBO and particle swarm optimization (PSO).
This paper extends the gossip algorithm, widely studied in the literature on distributed computing and control algorithms, to networks of quantum systems. In doing so, we reinterpret the classical algorithm and the av...
详细信息
ISBN:
(纸本)9781467357159
This paper extends the gossip algorithm, widely studied in the literature on distributed computing and control algorithms, to networks of quantum systems. In doing so, we reinterpret the classical algorithm and the average consensus task as a symmetrization problem with respect to the action of the permutation group. This allows us to extend in a natural way the gossip consensus algorithm to the quantum setting and prove its convergence properties to symmetric states while preserving the expectation of permutation-invariant global observables.
In order to solve the problem that the traditional web filtering systems cannot filter the sensitive webpages effectively in real time,a multi-level web content filtering model based on MapReduce is *** kinds of filte...
详细信息
ISBN:
(纸本)9781632666284
In order to solve the problem that the traditional web filtering systems cannot filter the sensitive webpages effectively in real time,a multi-level web content filtering model based on MapReduce is *** kinds of filtering strategies are employed in this model,which are blocking of IP address and URL,keyword filtering and intelligent *** intelligent filtering mechanism,the improved Knn algorithm based on maximum category space is adopted to classify the webpages intelligently and filter out the sensitive *** reduce the filtering time,we propose a parallelization framework based on the distributed computing *** framework will carry out the large calculation of feature vector and Euclidean distance in parallel,which improves the filter efficiency a *** result shows that the distributed filtering model can filter the sensitive webpages effectively in real time,and higher rate of web filtering performance with high accuracy can by achieved by the increasing of the distributed nodes.
The certification of timeliness of distributed real-time embedded computing systems is the most challenging part of the reliability certification process. Techniques for systematic derivation of tight service time bou...
详细信息
The certification of timeliness of distributed real-time embedded computing systems is the most challenging part of the reliability certification process. Techniques for systematic derivation of tight service time bounds of subsystems of distributed real-time systems, are a key for enabling practical and highly meaningful certification of timeliness. The use of a hybrid of measurement-based statistical derivation approaches and program structure analysis approaches in deriving tight service time bounds, is proposed. The use of a hybrid approach together with a divide-and-conquer strategy in deriving the service time bounds of complex systems, is also proposed. Various possible ways of formulating such hybrid approaches are indicated and then one specific approach is discussed. Major research issues that need to be resolved before the hybrid approach can be practiced are mentioned.
The problem of migrating sensitive information between systems in dynamic environments is increasingly important as distributed computing expands. A proposed policy-based approach provides controlled and secure transf...
详细信息
The problem of migrating sensitive information between systems in dynamic environments is increasingly important as distributed computing expands. A proposed policy-based approach provides controlled and secure transfer of user credentials and data across platforms. We propose a policy-driven data-protection system to address the inadequacies of current technological solutions in preserving the confidentiality and privacy of data while it migrates between platforms. More specifically, we describe our solution for securing credential migration that we''re developing for productization.
Checkpointing, the process of saving program/application state, usually to a stable storage, has been the most common fault-tolerance methodology for high-performance applications. The rate of checkpointing (how often...
详细信息
Checkpointing, the process of saving program/application state, usually to a stable storage, has been the most common fault-tolerance methodology for high-performance applications. The rate of checkpointing (how often) is primarily driven by the failure rate of the system. If the checkpointing rate is low, fewer resources are consumed but the chance of high computational loss is increased and vice versa if the checkpointing rate is high. It is important to strike a balance, and an optimum rate of checkpointing is required. In this paper, we analytically model the process of checkpointing in terms of mean-time-between-failure of the system, amount of memory being checkpointed, sustainable I/O bandwidth to the stable storage, and frequency of checkpointing. We identify the optimum frequency of checkpointing to be used on systems with given specifications thereby making way for efficient use of available resources and maximum performance of the system without compromising on the fault-tolerance aspects. Further, we develop discrete-event models simulating the checkpointing process to verify the analytical model for optimum checkpointing. Using the analytical model, we also investigate the optimum rate of checkpointing for systems of varying resource levels ranging from small embedded cluster systems to large supercomputers.
We propose a technique to distribute the work-load of online route planners as offered for example by Bing/Google/Yahoo Maps, etc. among the clients requesting the routes. Our scheme not only increases the throughput ...
详细信息
ISBN:
(纸本)9781467350754
We propose a technique to distribute the work-load of online route planners as offered for example by Bing/Google/Yahoo Maps, etc. among the clients requesting the routes. Our scheme not only increases the throughput of a server answering the requests of clients but also yields a simple way of providing some degree of privacy for the user. A prototype implementation of our system is available as an Android app in Google Play and on Github.
暂无评论