Data partitioning is one of the main problems in parallel and distributed simulation. Distribution of data over the architecture directly influences the efficiency of the simulation. The partitioning strategy becomes ...
详细信息
Data partitioning is one of the main problems in parallel and distributed simulation. Distribution of data over the architecture directly influences the efficiency of the simulation. The partitioning strategy becomes a complex problem because it depends on several factors. In an Individual-oriented Model, for example, the partitioning is related to interactions between the individual and the environment. Therefore, parallel and distributed simulation should dynamically enable the interchange of the partitioning strategy in order to choose the most appropriate partitioning strategy for a specific context. In this paper, we propose a strip partitioning strategy to a spatially dependent problem in Individual-oriented Model applications. This strategy avoids sharing resources, and, as a result, it decreases communication volume among the processes. In addition, we develop an objective function that calculates the best partitioning for a specific configuration and gives the computing cost of each partition, allowing for a computing balance through a mapping policy. The results obtained are supported by statistical analysis and experimentation with an Ant Colony application. As a main contribution, we developed a solution where the partitioning strategy can be chosen dynamically and always returns the lowest total execution time.
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors ge...
详细信息
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors generated by simple local interaction, like observed in self-organized systems. Usually, this type of simulations are highly expensive in terms of computing and communications. One one hand, we can reduce the computing involved in individual interactions by means of developing a robust partitioning method. On the other hand, we have to be able to efficiently handle a huge number of individuals interacting with other individuals stored in memory of remote processors. In this work we will analyze and compare three communication strategies: synchronous and asynchronous message passing (via MPI) and bulk-synchronous parallel (BSP) for our distributed cluster-based individual-oriented fish school simulator. In this type of simulations, the main contributions of our work are: a) we showed that distributed time-driven simulations do not always improve the performance when using synchronous communication strategies, b) we show asynchronous communications strategies are more efficient. In addition, we have verified that the bulk-synchronous parallel method is a scalable.
In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions s...
详细信息
In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions since the results are not deterministic. Consequently how to guarantee that the output results are statistically trusted. In this work we apply a statistical approach in order to define the transient and steady state in discrete event distributed simulation. We used linear regression and batch method to find the optimal simulation size. As contributions of our work we can enumerate: we have applied and adapted the simple statistical approach in order to define the optimal simulation length; we propose the approximate approach to normal distribution instead of generate replications sufficiently large; and the method can be used in other kind of non-terminating science simulations where the data either have a normal distribution or can be approximated by a normal distribution.
Individual-oriented simulation allows us to represent the global behavior of a system through local interaction in discrete time steps. As we face up close-to-reality models and large-scale workloads, we focus on turn...
详细信息
Individual-oriented simulation allows us to represent the global behavior of a system through local interaction in discrete time steps. As we face up close-to-reality models and large-scale workloads, we focus on turning from traditional approaches towards distributed simulation in order to obtain more accurate results in less time. One of the main problems in distributed simulation is how to distribute individuals efficiently through distributed architecture. Individual-oriented systems can be implemented in a distributed fashion by using either a grid-based or cluster-based approach. On one hand, grid-based approaches consist of assigning to each node a simulation space portion, together with the set of individuals currently residing in that area. On the other hand, cluster-based approaches consist of assigning to each node a fixed set of individuals. In this work we present a cluster-based method based on Voronoi diagrams and covering radius criterion in order to avoid unnecessary interaction between individuals. We can show experimentally that our proposal reduces the communication and computing times significantly increasing simulation efficiency.
Accurate indirect jump prediction is critical for some applications. Proposed methods are not efficient in terms of chip area. Our proposal evaluates a mechanism called target encoding that provides a better ratio bet...
详细信息
作者:
Jorba, JosepMargalef, TomásLuque, Emilio
Estudis d'Informatica Multimedia i Telecomunicacio Rambla del Poblenou 156 ES-08018 Barcelona Spain
Computer Architecture and Operating Systems Department ES-08193 Bellaterra Spain
Performance is a crucial issue of parallel/distributed applications. One kind of useful tools, in this context, are the automatic performance analysis tools, that help developers in some of the phases of the performan...
详细信息
Automatic analysis and tuning is a key strategy that helps to exploit the potential of high performance systems. However, for parallel applications with long running times, dynamic behaviour or highly data dependent p...
详细信息
Automatic analysis and tuning is a key strategy that helps to exploit the potential of high performance systems. However, for parallel applications with long running times, dynamic behaviour or highly data dependent performance patterns, it is necessary to make use of the strength of dynamic auto-tuning. An important factor in dynamic auto-tuning on a large scale is the number of additional resources required by the tuning system itself in order to reduce impact on the application performance. A tradeoff must be made between the loss of effectiveness of a tuning system using too few resources and the loss of its efficiency using too many resources. Most automatic analysis or tuning systems do not provide assistance for defining how many additional resources are required. In this work, we address this problem proposing a method focused on calculating the structure of hierarchical tuning networks. The topology will be composed of the minimum number of non-saturated resources. Experimental evaluation performed covers different use cases, each one showing that tuning networks built according to our proposal make efficient use of resources, while providing a high quality analysis and tuning environment.
The growing processing power of parallel computing systems require interconnection networks a higher level of complexity and higher performance, thus consuming more energy. Link components contributes a substantial pr...
详细信息
The growing processing power of parallel computing systems require interconnection networks a higher level of complexity and higher performance, thus consuming more energy. Link components contributes a substantial proportion of the total energy consumption of the networks. Many researchers have proposed approaches to judiciously change the link speed as a function of traffic to save energy when the traffic is light. However, the link speed reduction incurs an increase in average packet latency, thus degrades network performance. This paper addresses that issue with a performance-aware energy saving mechanism. The simulation results show that the proposed mechanism outperforms the energy saving mechanisms in literature.
Partitioning and load balancing are highly important issues in distributed individual-oriented simulation. Choosing how to distribute individuals on the distributed environment can be a crucial factor at the moment of...
详细信息
Partitioning and load balancing are highly important issues in distributed individual-oriented simulation. Choosing how to distribute individuals on the distributed environment can be a crucial factor at the moment of executing the simulation. Partitioning an individual-oriented system should be efficient in order to reduce communication involved in interaction between individuals belong to different logical processes. Furthermore, if the individual-oriented model exhibits mobility patterns, we should be able to maintain the load balancing in order to keep the global application performance. In this work, we present a proximity load balancing strategy for a distributed cluster-based individual-oriented fish school simulator. On one hand, we implement a robust cluster-based partitioning method by means of covering radius criterion and voronoi diagrams. We use a proximity criterion to distribute individuals on the distributed architecture. On the other hand, we propose a proximity load balancing strategy in order to maintain the application performance as the simulation progresses.
Current parallel scientific applications generate a huge amount of data that must be managed efficiently for the HPC storage systems. However, the I/O performance depends on the application I/O behavior and the config...
详细信息
暂无评论