OLAP (online analytical processing) applications are based on a variety of aggregate queries on large-scale data. As aggregation is always performed on columns, traditional row-oriented storage, in which all the colum...
详细信息
ISBN:
(纸本)9780769546766
OLAP (online analytical processing) applications are based on a variety of aggregate queries on large-scale data. As aggregation is always performed on columns, traditional row-oriented storage, in which all the columns of a data row are stored together, has seriously restricted its performance. This paper proposes a dimension-oriented storage model based on HBase, and a new parallel aggregation technique, which accomplishes aggregation operations with parallel MapReduce jobs. Finally, compared with Hive on standard TPC-H data set, our technique is demonstrated to improve performance of core aggregate operations significantly.
Recognition and mining (RM) applications are an emerging class of computing workloads that will be commonly executed on future multi-core and many-core computing platforms. The explosive growth of input data and the u...
详细信息
ISBN:
(纸本)9781424437511
Recognition and mining (RM) applications are an emerging class of computing workloads that will be commonly executed on future multi-core and many-core computing platforms. The explosive growth of input data and the use of more sophisticated algorithms in RM applications will ensure, for the foreseeable future, a significant gap between the computational needs of RM applications and the capabilities of rapidly evolving multi- or many-core platforms. To address this gap, we propose a new parallel programming model that inherently embodies the notion of best-effort computing, wherein the underlying parallel computing environment is not expected to be perfect. The proposed best-effort programming model leverages three key characteristics of RM applications: (1) the input data is noisy and it often contains significant redundancy, (2) computations performed on the input data are statistical in nature, and (3) some degree of imprecision in the output is acceptable. As a specific instance of the best-effort parallel programming model, we describe an "iterative-convergence" parallel template, which is used by a significant class of RM applications. We show how best-effort computing can be used to not only reduce computational workload, but to also eliminate dependencies between computations and further increase parallelism. Our experiments on an 8-core machine demonstrate a speed-up of 3.5X and 4.3X for the K-means and GLVQ algorithms, respectively, over a conventional parallel implementation. We also show that there is almost no material impact on the accuracy of results obtained from best-effort implementations in the application context of image segmentation using K-means and eye detection in images using GLVQ.
Replicated state machine (RSM) is usually used for building highly available services in today's cloud systems. By analyzing typical instances and summing up their common characteristics, we introduce a modular ab...
详细信息
ISBN:
(纸本)9781538637906
Replicated state machine (RSM) is usually used for building highly available services in today's cloud systems. By analyzing typical instances and summing up their common characteristics, we introduce a modular abstract framework for devising optimized RSMs from the load distribution perspective, which is an active exploration to find a new optimization design method. The framework provides a faithful deconstruction of a class of RSMs optimized by load distribution. Through the modularization of functionalities/mechanisms, the framework decouples the design of components of a RSM to some extent and facilitates the reusability of existing and optimized functional components. By abstracting the configuration and specifying the execution procedure, the abstract skeletal protocol in this framework simplifies the description of optimized RSMs. We finally present the reconstruction of S-RSM, a RSM that is optimized based on load distribution, to illustrate the effectiveness of our framework.
Computational scientists and engineers commonly rely on established software libraries to achieve high performance and reliability in their numerical applications. Unfortunately, this approach does not work well if th...
详细信息
ISBN:
(纸本)9780769549712
Computational scientists and engineers commonly rely on established software libraries to achieve high performance and reliability in their numerical applications. Unfortunately, this approach does not work well if the desired functionality is absent in existing libraries or if the integration is difficult. In such scenarios, one is often forced to explore alternative algorithms and in-house implementations. Such exploration can be a challenging task for computational scientists and engineers without sufficient computer science background. To address this issue, we design and build an automated rapid prototyping tool for regular grid-based numerical applications. This new tool allows programmers to specify algorithms as composition of familiar computation patterns such as those easily found in open literature expressed as generalized elemental subroutines. The tool then automatically transforms such subroutines into code which adapts to the prescribed data structures and delivers performance expected from the underlying algorithms. We demonstrate the tool in use cases including a production-grade computational fluid dynamic application.
The strict power efficiency constraints required to achieve exascale systems will dramatically increase the number of detected and undetected transient errors in future high performance computing (HPC) systems. Among ...
详细信息
ISBN:
(纸本)9780769561493
The strict power efficiency constraints required to achieve exascale systems will dramatically increase the number of detected and undetected transient errors in future high performance computing (HPC) systems. Among the various factors that effect system resiliency, the impact of compiler optimizations on the vulnerability of scientific applications executed on HPC systems has not been widely explored. In this work, we analyze whether and how most common compiler optimizations impact the vulnerability of several mission-critical applications, what are the trade-offs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability. We show that highly-optimized code is generally more vulnerable than unoptimized code. We also show that, while increasing optimization level can drastically improve application performance as expected. However, certain cases of optimization may provide only marginal benefits, but considerably increase application vulnerability.
Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph spa...
详细信息
ISBN:
(纸本)9781509036820
Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph sparsification as an efficient abstraction for updating the properties of dynamic networks. We demonstrate the applicability of graph sparsification in updating the connected components in random and scale-free networks on shared memory systems. Our results show that the updating is scalable (10X on 16 processors for larger networks). To the best of our knowledge this is the first parallel implementation of graph sparsification. Based on these initial results, we discuss how the current implementation can be further improved and how graph sparsification can be applied to updating other network properties.
Modern sensor technologies, internet and advanced irrigation equipment allow a relative precise control of agricultural irrigation that leads to high water-use efficiency. However, the core control algorithms that mak...
详细信息
ISBN:
(纸本)9781538637906
Modern sensor technologies, internet and advanced irrigation equipment allow a relative precise control of agricultural irrigation that leads to high water-use efficiency. However, the core control algorithms that make use of these technologies have not been well studied. In this work, a reinforcement learning based irrigation control technique is investigated. The delayed reward of crop yield is handled by the temporal difference technique. The learning process can be based on both off-line simulation and real data from sensors and crop yield. Neural network based fast models for soil water level and crop yield are developed to improve the scalability of learning. Simulations for various geographic locations and crop types show that the proposed method can significantly increase net return considering both crop yield and water expense.
Currently loud data centers exist problems of load imbalance and high power consumption. This paper studies the virtual machine placement policy in cloud environment by applying live migration techniques, and proposes...
详细信息
ISBN:
(纸本)9781538637906
Currently loud data centers exist problems of load imbalance and high power consumption. This paper studies the virtual machine placement policy in cloud environment by applying live migration techniques, and proposes a target host selection algorithm MOGA-THSA based on MOGA. As a heuristic algorithm, through designing excellent genetic operators and fitness functions, it optimizes the load balance and power consumption of the data center with smaller SLA violation rate. The algorithm is implemented on simulation platform CloudSim, and experiments show that it can improve the load balance of cloud data center and decrease total power consumption effectively, thus having a certain guiding significance for researching the virtual machine placement policy.
Mining job scheduling features based on extraction and analysis of workload trace in high performance computing clusters can be used to optimize scheduling strategy and enhance system performance. Based on detailed an...
详细信息
ISBN:
(纸本)9781538637906
Mining job scheduling features based on extraction and analysis of workload trace in high performance computing clusters can be used to optimize scheduling strategy and enhance system performance. Based on detailed analysis of workload trace from a gene sequencing high performance computing system, this paper proposes a multi-queue backfilling scheduling algorithm, which is based on traditional backfilling scheduling. While optimizing for memory resource demands, this algorithm provides queue level load balancing to deal with the innate load imbalance characteristics of high performance systems. Experimental results based on practical gene sequencing workload trace clearly demonstrate that compared with traditional scheduling algorithms, the algorithm proposed in this paper is a good strategy to reduce the job waiting time and improve resource utilization.
As the scale of big data continues to grow and the complexity of data analysis algorithms increases, the desire for greater computing power is increasingly evident. A popular approach is to utilize heterogeneous syste...
详细信息
ISBN:
(纸本)9781665435741
As the scale of big data continues to grow and the complexity of data analysis algorithms increases, the desire for greater computing power is increasingly evident. A popular approach is to utilize heterogeneous systems for computation. Discrete CPU-GPU system, which has CPU and GPU on different chips connected through a PCI-e bus, is a typical heterogeneous system. There have been a lot of schemes proposed to improve performance on the discrete CPU-GPU system through partitioning. However, most of them are for regular applications and far from ideal in terms of resource utilization and performance for irregular applications. Heterogeneous computing receives much attention due to its performance potential, and many regular applications are greatly benefited from it, but the acceleration of irregular applications is still a problem to be solved. In this paper, we propose a dynamic fine-grained workload partitioning approach for irregular applications that boosts resource utilization to achieve a better load balance on heterogeneous platforms. The approach monitors the kernel execution of the CPU and GPU at runtime, and finely partitions the workload according to their processing speed, assigning relatively regular data to the GPU and the rest to the CPU. Evaluated with various irregular workloads, our scheme achieves up to 20% performance improvement over the state-of-the-art coarse-grained scheme, and the performance gap is less than 5% in most cases compared to oracle-based partitioning.
暂无评论