Traditional distributed schedulers only consider the scheduling of jobs and treat the storage system as static and already deployed. However, individual application often needs its own configuration of storage system....
详细信息
ISBN:
(纸本)9781467375894
Traditional distributed schedulers only consider the scheduling of jobs and treat the storage system as static and already deployed. However, individual application often needs its own configuration of storage system. Therefore, traditional distributed schedulers are not able to serve multiple tenants. To improve the resource utilization, some mechanisms are needed to consolidate multiple applications running on top of the same computing resources. Flex is such a system for helping the distributed scheduler to deploy a customized storage system before a job is scheduled without too much overhead incurred. The lightweight operating system level virtualization technology, aka containers in Linux, is used to wrap the computing environment and isolate different applications. Flex can achieve two goals that traditional distributed scheduler cannot provide. One is to deploy a specific storage system for the requirement of an individual application;the other is to consolidate applications for a better use of the underlying computing resources. Thus, by employing storage scheduling, Flex improves the resource utilization of distributed systems.
Matrix factorization (MF), as one of the most accurate and scalable approaches in dimension reduction techniques, has become popular in the collaborative filtering (CF) recommender systems communities. Currently, Non-...
详细信息
ISBN:
(纸本)9781538637906
Matrix factorization (MF), as one of the most accurate and scalable approaches in dimension reduction techniques, has become popular in the collaborative filtering (CF) recommender systems communities. Currently, Non-negative Matrix Factorization (NMF) is one of the most famous approaches for MF, due to its representative non-negativity feature for CF model. However, it is non-trivial to obtain high performance of sparse NMF (SNMF) on Graphic processing Units (GPU) for large-scale problems, due to the redundant large-scale intermediate data, frequent matrices manipulation, and access on the sparse rating matrix with irregular distribution non-zero entries. In this work, we propose single-thread-based SNMF, which depends on the involved feature tuples multiplication and summation, and then, we present L-2 norm regularized single-thread-based SNMF. On that basis, a novel CUDA parallelization NMF approach (CuSNMF) is presented for GPU computing. Furthermore, to process large-scale CF data sets and make advantages of GPU computation power, we propose multi-GPU CuSNMF (MCuSNMF). Compared with state-of-the-art parallel algorithms, CCD++, and CUMF, MCuSNMF obtains the highest performance.
We have been building the Seoul Grid testbed which covers whole area of Seoul, the capital city of Korea, For it, we use Globus, an enabling software which allows a vertically integrated treatment of networked computi...
详细信息
ISBN:
(纸本)1892512416
We have been building the Seoul Grid testbed which covers whole area of Seoul, the capital city of Korea, For it, we use Globus, an enabling software which allows a vertically integrated treatment of networked computing resources. However, currently Globus does not provide all solutions in package for our Grid testbed. In order to run our Grid testbed, we need an unified resource management framework which meets our requirement. Thus, we have developed an unified resource management framework for our Seoul Grid testbed. We implemented the framework into Seoul Grid portal. In this paper, we explain the details of the framework in our Seoul Grid portal.
The performance of a conservative time management algorithm in a distributed simulation system degrades significantly if a large number of null messages are exchanged across the logical processes in order to avoid dea...
详细信息
ISBN:
(纸本)1601320841
The performance of a conservative time management algorithm in a distributed simulation system degrades significantly if a large number of null messages are exchanged across the logical processes in order to avoid deadlock. This situation gets more severe when the exchange of null messages is increased due to the poor selection of key parameters such as lookahead values. This paper presents a generic mathematical model that uses null messages to avoid deadlock. Since the proposed mathematical model is generic, the performance of any conservative synchronization algorithm can be approximated. In addition, we develop a performance model that demonstrates that how a conservative distributed simulation system performs with the null message algorithm (NMA). The simulation results show that the performance of a distributed system degrades if the NMA generates an excessive number of null messages due to the improper selection of parameters.
Many important parallelapplications require multiple flows of control to run on a single processor In this paper, we present a study of four flow-of-control mechanisms: processes, kernel threads, user-level threads a...
详细信息
ISBN:
(纸本)0769526373
Many important parallelapplications require multiple flows of control to run on a single processor In this paper, we present a study of four flow-of-control mechanisms: processes, kernel threads, user-level threads and event-driven objects. Through experiments, we demonstrate the practical performance and limitations of these techniques on a variety of platforms. We also examine migration of these flows-of-control with focus on thread migration, which is critical for application-independent dynamic load balancing in parallel computing applications. Thread migration, however, is challenging due to the complexity of both user and system state involved. In this paper we present several techniques to support migratable threads and compare the performance of these techniques.
distributed computing system is considered as a fundamental architecture to extend resources such as computation speed, storage capacity, and network bandwidth, which are limited for a single processor. Emerging big d...
详细信息
ISBN:
(纸本)9781479920815
distributed computing system is considered as a fundamental architecture to extend resources such as computation speed, storage capacity, and network bandwidth, which are limited for a single processor. Emerging big data processingtechniques like Hadoop take advantages of distributed servers to accomplish scalable parallel computations. Large-scale processing jobs can run on different servers or even different clusters interdependently and be combined together as a workflow to provide meaningful outputs. In this paper, we analyze the common demands of big-data processing and distributed big-data workflow processing. According to that, we design PipeFlow Engine that has the matching features to meet each of these demands. It orchestrates all involved jobs and schedules them in a batched pipeline mode. We also present two online ranking algorithms that make use of the PipeFlow, sharing the experience and best practice of using PipeFlow.
Recently, the abstraction of coflow is introduced to capture the collective data transmission patterns among modern distributed data-parallel application. During processing, coflows generally act as barriers;according...
详细信息
ISBN:
(纸本)9781450388160
Recently, the abstraction of coflow is introduced to capture the collective data transmission patterns among modern distributed data-parallel application. During processing, coflows generally act as barriers;accordingly, time-sensitive applications prefer their coflows to complete within deadlines and deadline-aware coflow scheduling becomes very crucial. Regarding these data-parallelapplications, we notice that many of them, including large-scale query system, distributed iterative training, and erasure codes enabled storage, are able to tolerate loss-bounded incomplete inputs by design. This tolerance indeed brings a flexible design space for the schedule of their coflows: when getting overloaded, the network can trade coflow completeness for timeliness, and balance the completenesses of different coflows on demand. Unfortunately, existing coflow schedulers neglect this tolerance, resulting in inflexible and inefficient bandwidth allocations. In this paper, we explore this fundamental trade-off and design Poco, a POlicy-based COflow scheduler, to achieve customizable selective coflow completions for these emerging time-sensitive distributedapplications. Internally, Poco employs a suite of novel designs along with admission controls to make flexible, work-conserving, and performance-guaranteed rate allocation to online coflow requests very efficiently. Extensive trace-based simulations indicate that Poco is highly flexible and achieves optimal coflow schedules respecting the requirements specified by applications.
Data Stream processing (DaSP) is a paradigm characterized by on-line (often real-time) applications working on unlimited data streams whose elements must be processed efficiently "on the fly". DaSP computati...
详细信息
ISBN:
(纸本)9781479942930
Data Stream processing (DaSP) is a paradigm characterized by on-line (often real-time) applications working on unlimited data streams whose elements must be processed efficiently "on the fly". DaSP computations are characterized by data-flow graphs of operators connected via streams and working on the received elements according to high throughput and low latency requirements. To achieve these constraints, high-performance DaSP operators requires advanced parallelism models, as well related design and implementation techniques targeting multi-core architectures. In this paper we focus on the parallelization of the window-based stream join, an important operator that raises challenging issues in terms of parallel windows management. We review the state-of-the-art solutions about the stream join parallelization and we propose our novel parallel strategy and its implementation on multicores. As demonstrated by experimental results, our parallel solution introduces two important advantages with respect to the existing solutions: (i) it features an high-degree of configurability in order to address the symmetricity/ asymmetricity of input streams (in terms of their arrival rate and window length);(ii) our parallelization provides a high throughput and it is definitely better than the compared solutions in terms of latency, providing an efficient way to perform stream joins on latency-sensible applications.
The paper proposes the architecture of the visualization component for the Security Information and Event Management (SIEM) system. The SIEM systems help to comprehend large amounts of the security data. Visualization...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
The paper proposes the architecture of the visualization component for the Security Information and Event Management (SIEM) system. The SIEM systems help to comprehend large amounts of the security data. Visualization is the essential part of the SIEM systems. The suggested architecture of the visualization component allows incorporating different visualization technologies and extending easily the application functionality. To illustrate the approach, we developed the prototype of the SIEM visualization component. The paper demonstrates the graphical user interface of the attack modeling component. To increase the efficiency of the visualization techniques we applied principles of the human information perception and interaction issues when designing graphical components.
Supercomputer user groups are using the FTP for their large rile transfer. It includes extensive research information such as automobile or airplane design, medicine development, weather forecast and complex mathemati...
详细信息
ISBN:
(纸本)1892512459
Supercomputer user groups are using the FTP for their large rile transfer. It includes extensive research information such as automobile or airplane design, medicine development, weather forecast and complex mathematical formula that need to be calculated by High-performance Supercomputer. The existing FTP has caused user to discomfort by delay and data loss according to network status. Therefore, it is necessary to efficiently raise transfer performance and minimize data loss. In this research, the auto-tuning is used for large rile transfer, and Quality of Service (QoS) is provided based on network status. As a result of performance enhancement, data loss ratio can be minimized and network becomes more stable, which makes supercomputer users believe that their computational results are precise. This paper focuses on related works about performance enhancement of large file transfer, and then does performance evaluation of the proposed scheme which provides QoS during bulk file transfer phase based on the network status.
暂无评论