A challenge for parallel programmers is to efficiently execute traditional MPI applications, designed to be run in a cluster of single core nodes, on a multicore cluster. Multicore clusters include communication heter...
详细信息
A challenge for parallel programmers is to efficiently execute traditional MPI applications, designed to be run in a cluster of single core nodes, on a multicore cluster. Multicore clusters include communication heterogeneities which have to be handled carefully to improve efficiency and speedup. This research presents an execution tool developed for SPMD applications which is focused on managing communications heterogeneities, distributing the workload among cores and enhancing parallel performance on multicore clusters. Our tool has been designed through using an execution methodology which includes mapping and scheduling strategies. The tool integrates five modules which give programmers a method to execute their applications efficiently. This tool is centered on improving SPMD applications designed to use MPI for communications. These applications were selected because they are the most commonly used in parallel computing. Also, these applications are chosen due to their data synchronization and communications volumes which can generate communication imbalance issues. The novel contribution of this tool is to permit programmers to find a minimum execution time, while the efficiency level is maintained over a defined threshold. Our tool has been tested in different multicore clusters and with a set of scientific applications. The results obtained show a considerable improvement in the applications efficiency when the tool is applied.
Currently, the need to learn parallel applications topics in students has become an important issue due to the rapid growth in the parallel computing field. In fact, this topic has been included in computer Science cu...
详细信息
Currently, the need to learn parallel applications topics in students has become an important issue due to the rapid growth in the parallel computing field. In fact, this topic has been included in computer Science curriculum, but students present difficulties to design MPI parallel applications efficiently. We present a novel methodology for teaching parallel programming centered on improving parallel applications written by students through their experiences obtained during classes. The methodology integrates theoretical and practical sections which are focused on teaching two parallel paradigms, master/Worker and SPMD. These paradigms were selected due to their different communication and computation behaviors, which generate challenges for students when they wish to improve performance application metrics. Our methodology allows students to discover their own errors and how to correct them. In addition, students analyze the issues and advantages in the application designed in order to enhance the performance metrics. Applying this methodology gave us a significant progress in parallel applications designed by students, where we have observed an improvement of around 47% in the students’ skill about parallel programming when they design parallel applications.
Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We se...
详细信息
Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing applications on different systems by extracting a signature which will allow us to predict what system will allow the application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an application based on its behavior. Based on the application's message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the application's performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.
This work describes ongoing work for measuring the performance of an application running on a machine, where this measurement takes a fraction of the time required to run the application itself thoroughly. We call it ...
详细信息
This work describes ongoing work for measuring the performance of an application running on a machine, where this measurement takes a fraction of the time required to run the application itself thoroughly. We call it Performance Software Probe. The objective is to have knowledge of this machine/application performance previous to the execution, and without the need to even install this application on the machine to characterize. Our goal is to enhance efficiency of master/worker applications on highly heterogeneous multiclusters, where the available machines - and their respective performance indexes - are not known until the time we have them available for execution.
In Grid environments, many different resources are intended to work in a coordinated manner, each resource having its own features and complexity. As the number of resources grows, simplifying automation and managemen...
详细信息
Distributed video-on-demand servers (DVS) are proposed as a solution to the limited streaming capacity and null scalability of large-scale centralized systems. Server interconnection topology plays an important role i...
详细信息
Distributed video-on-demand servers (DVS) are proposed as a solution to the limited streaming capacity and null scalability of large-scale centralized systems. Server interconnection topology plays an important role in video-on-demand systems' performance. This paper presents an analysis of different topologies and their influence over storage management and distribution, delivery policies performance, refusing requests occurrence, network consumption and scalability. To accomplish the proposal study, we have designed a complete simulation framework for DVS systems. Experimental results obtained under different workload conditions allow us to draw two important conclusions: First, a better connectivity implies a lower mean request service distance and lesser network requirements, improving multicast policies efficiency. Second, topology regularity is essential, as it allows a greater traffic balancing and provides more alternative routing paths. The analysis of global results shows that hypercube presents the best trade-off among all the evaluated metrics, providing a gradual and unlimited scalability for the DVS system.
暂无评论