distributedcomputing is becoming more and more led by technological and application advances. Many works consider new computing models compared to the classical closed model with a fixed number of participants and st...
详细信息
ISBN:
(纸本)9783642400476
distributedcomputing is becoming more and more led by technological and application advances. Many works consider new computing models compared to the classical closed model with a fixed number of participants and strong hypothesis on communication and structuration. Indeed, it is hard to imagine some application or computational activity and process that falls outside distributedcomputing. Internet and the web (e.g. social networks, clouds) are becoming the main application field for distributedcomputing. In addition to the classical challenges that developers have to face (asynchrony and failures) they have to deal with load balancing, malicious and selfish behaviors, mobility, heterogeneity and the dynamic nature of participating processes.
Natural scientists use large scale sensor networks for gathering and analyzing environmental data. However, the implementation work requires expert programmers. the problem is complicated by limited battery lifetime, ...
详细信息
ISBN:
(纸本)9781479920815
Natural scientists use large scale sensor networks for gathering and analyzing environmental data. However, the implementation work requires expert programmers. the problem is complicated by limited battery lifetime, processing power and memory capacity of the nodes, because this requires a low-level programming language. Since scientists are used to analyzing data with spreadsheets, researchers have studied the possibility of applying spreadsheet-based programming to sensor networks. the approaches so far either require a central server to execute the spreadsheet, or they execute a spreadsheet run-time on each node. the first approach causes higher communication cost since all data has to be routed to the central server and the second one causes computational overhead, because evaluating a spreadsheet is slower than executing handcrafted NesC-code. Hence, we present a spreadsheet driven tool-chain that can create efficient NesC-code and allows for simulation in the spreadsheet itself. the nodes have to recompute the spreadsheet formulas upon new data. However, we can avoid a large fraction of this recomputation by applying several optimization strategies during code generation. In our example scenario, sensor nodes compute the variance across a series of sensor readings. We can show that the optimizations save 65% CPU cycles and the code size decreases by 12% when compared to non-optimized execution of the spreadsheet. thus, our approach can deliver an easy way of developing sensor network programs while yielding very efficient code.
In a perfect world, code would only be written once and would run on different devices with high efficiency. A programmer's time would primarily be spent on thinking about the algorithms and data structures, not o...
详细信息
ISBN:
(纸本)9781479920815
In a perfect world, code would only be written once and would run on different devices with high efficiency. A programmer's time would primarily be spent on thinking about the algorithms and data structures, not on implementing them. To a degree, that used to be the case in the era of frequency scaling on a single core. However, due to power limitations, parallel programming has become necessary to obtain performance gains. But parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. these slow tasks frequently result in situations where most of the time is spent reimplementing old rather than writing new code. the goal of our research is to find new programming techniques that increase productivity, maintain high performance, and provide abstraction to free the programmer from these unnecessary and time-consuming tasks. However, such techniques usually come at the cost of substantial performance degradation. this paper investigates current approaches to portable accelerator programming, seeking to answer whether they make it possible to combine high efficiency with sufficient algorithm abstraction. It discusses OpenCL as a potential solution and presents three approaches of writing portable code: GPU-centric, CPU-centric and combined. By applying the three approaches to a real-world program, we show that it is at least sometimes possible to run exactly the same code on many different devices with minimal performance degradation using parameterization. the main contributions of this paper are an extensive review of the current state-of-the-art regarding the stated problem and our original approach of addressing this problem with a generalized excessive-parallelism approach.
the proceedings contain 73 papers. the topics discussed include: alignment-based metrics for trace comparison;validation and uncertainty assessment of extreme-scale HPC simulation through Bayesian inference;energy-eff...
ISBN:
(纸本)9783642400469
the proceedings contain 73 papers. the topics discussed include: alignment-based metrics for trace comparison;validation and uncertainty assessment of extreme-scale HPC simulation through Bayesian inference;energy-efficient scheduling with time and processors eligibility restrictions;scheduling jobs with multiple non-uniform tasks;workflow fairness control on online and non-clairvoyant distributedcomputing platforms;how to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributedsystems;scheduling HPC workflows for responsiveness and fairness with networking delays and inaccurate estimates of execution times;enhancing concurrency in distributed transactional memory through commutativity;adaptive granularity control in task parallel programs using multiversioning;and online dynamic dependence analysis for speculative polyhedral parallelization.
In this paper, an application of the analogical modeling and numerical simulation of the technical processes through the matrix of partial derivatives of the state vector (M-pdx) associated with Taylor series is prese...
详细信息
ISBN:
(纸本)9780769549804
In this paper, an application of the analogical modeling and numerical simulation of the technical processes through the matrix of partial derivatives of the state vector (M-pdx) associated with Taylor series is presented. the residual water blunting process is modeled as a distributed parameters process through the usage of equations with partial derivatives. In this approach the variation of the main output signal of the process (the chemical's pH value) both in relation with time and length of the tanks is considered.
In the last decades, many kinds of task execution models such as grid and cloud computing have been developed. In such distributedsystems, each task is processed by respective processor in multicored computers e.g., ...
详细信息
ISBN:
(纸本)9781479932177
In the last decades, many kinds of task execution models such as grid and cloud computing have been developed. In such distributedsystems, each task is processed by respective processor in multicored computers e.g., household PCs which we can easily harness in recent years. If there is one policy to automatically decide the "best" combination and the number of processors (and computers), we effectively utilize those computational resources, thereby large number of jobs can be executed in parallel. In this paper, we propose a method for mapping of execution units for such environments. the method adopts a remapping technology after processor-execution unit mapping[11] is finished. Experimental comparisons by a simulation show the advantages of the proposed method.
distributedparallel applications executed on heterogeneous and dynamic environments need to adapt their configuration (in terms of parallelism degree and parallelism form for each component) in response to unpredicta...
详细信息
ISBN:
(纸本)9783642400476
distributedparallel applications executed on heterogeneous and dynamic environments need to adapt their configuration (in terms of parallelism degree and parallelism form for each component) in response to unpredictable factors related to the physical platform and the application semantics. On emerging Cloud computing scenarios, reconfigurations induce economic costs and performance degradations on the execution. In this context, it is of paramount importance to define smart adaptation strategies able to achieve properties like control optimality (optimizing the application global QoS) and reconfiguration stability, expressed in terms of number of reconfigurations and the average time for which a configuration is not modified. In this paper we introduce a methodology to address this issue, based on Control theory and Optimal Control foundations. We present a first validation of our approach in a simulation environment, outlining its effectiveness and feasibility.
We study a simple parallel algorithm for computing matchings in a graph. A variant for unweighted graphs finds a maximal matching using linear expected work and O(log(2) n) expected running time in the CREW PRAM model...
详细信息
ISBN:
(纸本)9783642400476
We study a simple parallel algorithm for computing matchings in a graph. A variant for unweighted graphs finds a maximal matching using linear expected work and O(log(2) n) expected running time in the CREW PRAM model. Similar results also apply to External Memory, MapReduce and distributed memory models. In the maximum weight case the algorithm guarantees a 1/2-approximation. Although the parallel execution time is linear for worst case weights, an experimental evaluation indicates good scalabilty on distributed memory machines and on GPUs. Furthermore, the solution quality is very good in practice.
this paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balan...
详细信息
ISBN:
(纸本)9783642400476
this paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balance the workload, FLEX-MPI monitors the actual performance of applications via hardware counters and the MPI profiling interface-with a negligible overhead and minimal code modifications. Our results show that by using this approach the execution time of an application may be significantly reduced.
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although dis...
详细信息
ISBN:
(纸本)9781450321747
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributedsystems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called GraphChi, has been recently proposed. Although GraphChi significantly outperforms all representative (disk-based) distributed graph engines, we observe that GraphChi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called Turbo-Graph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms GraphChi by up to four orders of magnitude! Our implementation of TurboGraph is available at "http://***/turbograph" as executable files.
暂无评论