Many important parallel applications are dataparallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedic...
详细信息
Many important parallel applications are dataparallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedicated clusters, however, is complicated by the possibility of changes in workstation availability. For example, a personal workstation may be reclaimed by its primary user for interactive use. In such situations, a node must be removed from the collection of workstations forming the "virtual parallel machine" allocated to the application. and data redistributed accordingly. Conversely, workstations may become available to join the virtual parallel machine. This paper identifies fundamental characteristics of efficient policies for data redistribution following addition/removal of workstations front the cluster. The following conclusions are obtained based on mathematical analysis and simulations: (a) allocating data to a new node from the center of the data domain substantially reduces data migration costs compared to allocation from the edge;(b) addition in groups is beneficial compared to repeated single additions;and (c) even a large number of incremental adjustments of the data domain partitions, owing to successive additions/removals of nodes, do not appear to substantially degrade partition quality compared to that obtained by partitioning from scratch. We believe that these observations can be fruitfully incorporated in the design of workstation cluster support systems for data parallel computing. (C) 2004 Elsevier Inc. All rights reserved.
Energy consumption in datacenters has recently become a major concern due to the rising operational costs and scalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e.,...
详细信息
Energy consumption in datacenters has recently become a major concern due to the rising operational costs and scalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumed by the server nodes must be proportional to the amount of work performed. For dataparallelism and fault tolerance purposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A covering subset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In this work, we develop and analyze algorithms to maintain energy proportionality by discovering a covering subset that minimizes energy consumption while placing the remaining nodes in low-power standby mode in a data parallel computing cluster. Our algorithms can also discover covering subset in heterogeneous computing environments. In order to allow more dataparallelism, we generalize our algorithms so that it can discover k-covering subset, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results show that we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and working environments. (C) 2013 Elsevier Inc. All rights reserved.
Associative Computation is characterized by intertwining of search by content and dataparallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine ...
详细信息
Associative Computation is characterized by intertwining of search by content and dataparallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, dataparallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.
The modeling and simulation (M&S) of large crowd has become increasingly important in the domain of public security, such as facility planning, disaster response, and anti-terrorism operations. The behavior of a l...
详细信息
The modeling and simulation (M&S) of large crowd has become increasingly important in the domain of public security, such as facility planning, disaster response, and anti-terrorism operations. The behavior of a large crowd is highly complex, and the M&S of a large crowd at the individual level therefore demands the support of a scalable and efficient computing technology. In this study, a method was proposed to formulate crowd behavior with the cell automata and multi-agent models, which were successfully mapped onto the MapReduce programming model. A simulation framework was developed upon Hadoop to simulate large crowd scenarios over a cluster. The simulation process was then transformed to a series of parallel operations on data streams. The simulation studies on a large-scale evacuation scenario had indicated that the simulation framework ensured the simulation process' logic correctness. Experimental results also showed that the Hadoop-based simulation framework could complete five times more tasks while consuming only 19 % CPU time in comparison with the conventional simulation technology.
This work presents a shared memory parallel version of the hybrid classification algorithm IGSCR (iterative guided spectral class rejection) to facilitate the transition from serial to parallel processing. This transi...
详细信息
This work presents a shared memory parallel version of the hybrid classification algorithm IGSCR (iterative guided spectral class rejection) to facilitate the transition from serial to parallel processing. This transition is motivated by a demonstrated need for more computing power driven by the increasing size of remote sensing data sets due to higher resolution sensors, larger study regions, and the like. parallel IGSCR was developed to produce fast and portable code using Fortran 95, OpenMP, and the Hierarchical data Format version 5 (HDF5) and accompanying data access library. The intention of this work is to provide an efficient implementation of the established IGSCR classification algorithm. The applicability of the faster parallel IGSCR algorithm is demonstrated by classifying Landsat data covering most of Virginia, USA into forest and non-forest classes with approximately 90% accuracy. parallel results are given using the SGI Altix 3300 shared memory computer and the SGI Altix 3700 with as many as 64 processors reaching speedups of almost 77. parallel IGSCR allows an analyst to perform and assess multiple classifications to refine parameters. As an example, parallel IGSCR was used for a factorial analysis consisting of 42 classifications of a 1.2 GB image to select the number of initial classes (70) and class purity (70%) used for the remaining two images. (C) 2007 Elsevier Ltd. All rights reserved.
A smart city's efficiency must be achieved by mining large amounts of data generated by cyber-physical systems and electronic platforms using the large-scale data processing framework in cloud environment. Many cl...
详细信息
A smart city's efficiency must be achieved by mining large amounts of data generated by cyber-physical systems and electronic platforms using the large-scale data processing framework in cloud environment. Many cloud services rely on data parallel computing frameworks in cloud environment, which runs on hundreds of interconnected nodes. These frameworks divide the computationally intensive and data-intensive tasks into smaller tasks and run them concurrently on different nodes to improve performance. But providing improved performance in the processing environment is a challenge due to runtime variability. Due to different internal and external factors, nodes running these tasks do not perform well, resulting in the delay in the execution of these jobs. As a result of the inherent complexity of runtime variability, preventive measures for stragglers proved inadequate, and the problem continued to affect compute workloads even after the measures were taken. Several researchers proposed dynamic straggler identification approaches based on historical log analysis. This paper analyzes the relationship between several parameters obtained during job execution that will aid us in formulating and detecting the stragglers. Using data analysis, we developed the straggler identification approach and labeled the generated dataset. To achieve high performance using statistical features of historical resource usage, the proposed approach trains distributed XGBoost classifier which showed highest accuracy of 88.57%. Furthermore, we have empirically shown that blacklisting predicted stragglers led to a significant reduction in CPU, I/O, and mixed application execution times.
The prospect of a single internal market in 1992 has created a new spirit of adventure in the European Community. It represents a great challenge in all fields of trade and industry, especially information technology,...
详细信息
The prospect of a single internal market in 1992 has created a new spirit of adventure in the European Community. It represents a great challenge in all fields of trade and industry, especially information technology, and offers a unique opportunity to cut costs, expand markets, and exploit cooperation across the EC.
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a stream...
详细信息
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
Analysis of neural signals like electroencephalogram (EEG) is one of the key technologies in detecting and diagnosing various brain disorders. As neural signals are non-stationary and non-linear in nature, it is almos...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
Analysis of neural signals like electroencephalogram (EEG) is one of the key technologies in detecting and diagnosing various brain disorders. As neural signals are non-stationary and non-linear in nature, it is almost impossible to understand their true physical dynamics until the recent advent of the Ensemble Empirical Mode Decomposition (EEMD) algorithm. The neural signal processing with EEMD is highly compute-intensive due to the high complexity of the EEMD algorithm. It is also data-intensive because 1) EEG signals contain massive data sets 2) EEMD has to introduce a large number of trials in processing to ensure precision. The MapReduce programming mode is a promising parallelcomputing paradigm for data intensive computing. To increase the efficiency and performance of the neural signal analysis, this research develops parallel EEMD neural signal processing with MapReduce. In this paper, we implement the parallel EEMD with Hadoop in a modern cyberinfrastructure. Test results and performance evaluation show that parallel EEMD can significantly improve the performance of neural signal processing.
The training procedure of Hidden Markov Model (HMM) based Speech Recognition is often very time consuming because of its high computational complexity. The new parallel hardware like GPU can provide multi-thread proce...
详细信息
ISBN:
(纸本)9780769546766
The training procedure of Hidden Markov Model (HMM) based Speech Recognition is often very time consuming because of its high computational complexity. The new parallel hardware like GPU can provide multi-thread processing and very high floating-point capability. We take advantage of GPU to accelerate a popular HMM-based Speech Recognition package - HTK. Based on the sequential code of HTK, we design the "paraTraining", a parallel training model in HTK and develop different optimization methods to improve the performance of HTK on GPU which include unrolling the nested loops and using "reduction add" which can maximize the number of threads per block;using warp mechanism of GPU to reduce synchronizing latency;building different indices of threads to address data efficiently. Experimental results show that about 20+ speedup can be achieved without loss in accuracy. We also discuss the implementation of our method on multi-GPU and got around two times speedup compared with on single-GPU.
暂无评论