The fast adaptation of Cloud computing has led to an increase in novel information technology threats. The targets of these new threats range from large scale distributed system, such as the Large Hadron Collider by t...
详细信息
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors ge...
详细信息
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors generated by simple local interaction, like observed in self-organized systems. Usually, this type of simulations are highly expensive in terms of computing and communications. One one hand, we can reduce the computing involved in individual interactions by means of developing a robust partitioning method. On the other hand, we have to be able to efficiently handle a huge number of individuals interacting with other individuals stored in memory of remote processors. In this work we will analyze and compare three communication strategies: synchronous and asynchronous message passing (via MPI) and bulk-synchronous parallel (BSP) for our distributed cluster-based individual-oriented fish school simulator. In this type of simulations, the main contributions of our work are: a) we showed that distributed time-driven simulations do not always improve the performance when using synchronous communication strategies, b) we show asynchronous communications strategies are more efficient. In addition, we have verified that the bulk-synchronous parallel method is a scalable.
Automatic analysis and tuning is a key strategy that helps to exploit the potential of high performance systems. However, for parallel applications with long running times, dynamic behaviour or highly data dependent p...
详细信息
Automatic analysis and tuning is a key strategy that helps to exploit the potential of high performance systems. However, for parallel applications with long running times, dynamic behaviour or highly data dependent performance patterns, it is necessary to make use of the strength of dynamic auto-tuning. An important factor in dynamic auto-tuning on a large scale is the number of additional resources required by the tuning system itself in order to reduce impact on the application performance. A tradeoff must be made between the loss of effectiveness of a tuning system using too few resources and the loss of its efficiency using too many resources. Most automatic analysis or tuning systems do not provide assistance for defining how many additional resources are required. In this work, we address this problem proposing a method focused on calculating the structure of hierarchical tuning networks. The topology will be composed of the minimum number of non-saturated resources. Experimental evaluation performed covers different use cases, each one showing that tuning networks built according to our proposal make efficient use of resources, while providing a high quality analysis and tuning environment.
In the biotechnology field, the deployment of the Multiple Sequence Alignment (MSA) problem, which is a high performance computing demanding process, is one of the new challenges to address on the new parallel systems...
详细信息
When running parallel applications on HPC clusters usually the prior objectives are: almost linear speedup, efficient resources utilization, scalability and successful completion. Hence, applications executions are no...
详细信息
ISBN:
(纸本)9781479914449
When running parallel applications on HPC clusters usually the prior objectives are: almost linear speedup, efficient resources utilization, scalability and successful completion. Hence, applications executions are now facing a multiobjective problem which is focused on improving Performance while giving Fault Tolerance (FT) support, this combination is defined as Performability. The performance of Single Program Multiple Data (SPMD) applications written using a message-passing library (MPI) may be seriously affected, when applying a message logging approach, because they are tightly coupled and have a huge amount of communications. In this sense, we have proposed a novel method for SPMD applications which allows us to obtain the maximum speedup under a defined efficiency threshold considering the impact of a fault tolerance strategy when executing on multicore clusters. This method is based on four phases: characterization, tile distribution, mapping and scheduling. The idea of this method is to manage the effects of the added overhead of FT techniques, which seriously affect the MPI application performance. In this sense, our method manages the overheads of message logging by overlapping them with computation. Then, the main objective of this method is to determine the approximate number of computational cores and the ideal number of tiles, which permit us to obtain a suitable balance between speedup, efficiency and dependability. The obtained results illustrate that we can find the maximum speedup under a defined efficiency using a FT strategy with a small error rate of 5.4% for the worst case. By using our method, we can also determine the ideal problem size for a given number of computational cores (weak scalability) using FT with an error of around 5.8%. Results also show that our message logging approach could be tuned to introduce a constant overhead percentage when scaling the size of the problem.
The risk of having a program execution corrupted by transient faults is growing as computer processors are using more transistors, are becoming denser and are operating at lower voltages. This risk is multiplied when ...
详细信息
ISBN:
(纸本)9781479914449
The risk of having a program execution corrupted by transient faults is growing as computer processors are using more transistors, are becoming denser and are operating at lower voltages. This risk is multiplied when we take into account High Performance Computing with its hundreds or thousands of processors working together to solve a single problem. To evaluate how program executions behave in presence of transient faults we have proposed the concept of robustness against transient faults. This concept can be used to determine the more significant parts of a program with respect to the risk of misbehavior by transient faults for further study of improvement. The robustness concept can also be used as a metric to compare different approaches applied to a program to make it less likely of producing corrupted results. In this work we present why and how is possible to simplify a fraction of a program's robustness by taking into account the repetition of sequences of instructions. The simplified analysis obtains the exact same result as a full program robustness evaluation (exhaustively and without estimations). By simplifying the analysis we were able to reduce in up to 192 times our previously published robustness analysis time and also were able to evaluate larger programs in feasible time (unimaginable by using executions in a fault injection capable environment).
The accurate prediction of forest fire propagation is a crucial issue to minimize its effects. Several models have been developed to determine the forest fire propagation. Simulators implementing such models require d...
详细信息
The accurate prediction of forest fire propagation is a crucial issue to minimize its effects. Several models have been developed to determine the forest fire propagation. Simulators implementing such models require diverse input parameters to deliver predictions about fire propagation. However, the data describing the actual scenario where the fire is taking place are usually subject to high levels of uncertainty. The input-data uncertainty represents a serious drawback for the correctness of the prediction. So, a two-stage methodol- ogy was developed to calibrate the input parameters in an adjustment stage so that the calibrated parameters are used in the prediction stage to improve the quality of the predictions. This way, we relieve the effects of such uncertainty. In this work, we take advantage of this two stage methodology applying Genetic Algorithms as the calibration technique. However, the use of Genetic Algorithms require the execution of many simulations. This fact, added to the eventual long executions of the underlying simulator (due to its inherent complexity), implies to deal with another serious problem: the time needed to deliver the predictions. To be useful, the prediction must be provided much faster than real time. So, it is necessary to exploit all available computing resources. In this work, we present a two-stage forest fire spread prediction hybrid MPI-OpenMP application based on the Master- Worker paradigm and the parallelization of the FARSITE simulator in order to minimize the response time. The results as regards the enhancement in the quality of the predictions are reported, as well as the results regarding the time saving obtained by this hybrid application.
Current performance analysis and tuning tools must be able to improve the performance of large-scale parallel applications. To be effective, such analysis and tuning tools must be scalable and be able to manage the dy...
详细信息
ISBN:
(纸本)9781479913725
Current performance analysis and tuning tools must be able to improve the performance of large-scale parallel applications. To be effective, such analysis and tuning tools must be scalable and be able to manage the dynamic behaviour of parallel applications. This work presents a scalable solution for dynamic tuning. This approach is based on a hierarchical performance analysis architecture that uses a novel information abstraction mechanism to solve local and global performance problems. We have developed a prototype implementation of the proposed analysis architecture making use of the MRNet framework. Scalability experiments have been performed using this prototype with up to 6400 application tasks. The results obtained show that the proposed analysis architecture will provide the scalability required to carry out dynamic tuning of large-scale parallel applications.
Forest fires cause important losses around the world every year. A good prediction of fire propagation is a crucial point to minimize the devastating effects of these hazards. Several models that represent this phenom...
详细信息
Forest fires cause important losses around the world every year. A good prediction of fire propagation is a crucial point to minimize the devastating effects of these hazards. Several models that represent this phenomenon and provide a prediction of its spread have been developed. These models need input parameters which are usually difficult to know or even estimate. A two- stage prediction methodology was proposed to improve the quality of these parameters. In this methodology, such parameters are calibrated according to real observations and then, used in the prediction step. However, there are several parameters, which are not uniform along the map, but vary according to the topography of the terrain. Besides, these parameters are not constant along time but they are strongly dynamic. In such cases, it is necessary to introduce complementary models that overcome both restrictions. In the former case, the need of a spatial distribution model of a given variable is needed to be able to provide a spatial distribution for a given variable along the whole terrain by starting from the measured values of that parameter in certain points of the terrain. In the case of time variability, a complementary model such as weather forecasting model, could enable the capability of dealing with dynamic behavior of these parameters along time. In this paper, we describe an enhanced two-stage prediction scheme, where both type of complementary models a wind field model and a weather prediction model are coupled to the prediction scheme by enabling the system to dynamically adapts to complex terrains and dynamic conditions.
Fast pattern matching is a requirement for many problems, specially for bioinformatics sequence analysis like short read mapping applications. This work presents a variation of the FM-index method, denoted n -step FM-...
详细信息
Fast pattern matching is a requirement for many problems, specially for bioinformatics sequence analysis like short read mapping applications. This work presents a variation of the FM-index method, denoted n -step FM-index, that is applied in exact match genome search. We propose an alternative two-dimensional FM-index structure that allows backward-search navigation giving steps of n symbols at a time. The main advantages of this arrangement are the reduction of the computational work, but most importantly, the reduction by n of the chain of dependent data accesses, and the increase in the temporal locality of the data access pattern. This benefit comes at the expense of increasing the total amount of data required for the index. We present an in-depth performance analysis of a multi-core implementation of the algorithm using large references (up to 1.5G). We identify memory latency as the major performance limiter for single-thread execution and memory bandwidth for multi-thread execution. Our proposal provides speedups ranging from 1.4× to 2.4×, when there is no limitation on DRAM capacity. We also analyse the trade-off of compacting the proposed data structure in order to reduce memory capacity requirements, now at the expense of increasing execution time. An extra 33% of DRAM space allows our proposal to improve performance by 1.2×, while doubling DRAM size enables an additional 1.5×. Our proposal of n-step algorithm provides an alternative for pseudo-random memory access algorithms to be redesigned to scale in current and future computersystems.
暂无评论