Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software...
详细信息
ISBN:
(纸本)0769511651
Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software are much more difficult than testing and debugging sequential software. This is due to the fact that errors are usually reproducible in sequential programs while they may not be reproducible in parallel and distributed programs. In addition, parallel and distributed programs introduce new types of errors and anomalies, race conditions and deadlocks, that do not exist in sequential software. In this paper I present a survey and a taxonomy of existing approaches for detecting race conditions and deadlocks in parallel and distributed programs. These approaches can be classified into two main classes. Static analysis techniques, and dynamic analysis techniques. I have subdivided further the static analysis techniques into three different subgroups: The concurrency analysis methods, The data-flow analysis methods, and the formal proof methods. A brief discussion and highlighting of main problems in most known approaches is given. The paper is concluded with tables summarizing the comparison between the surveyed approaches.
Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures i...
详细信息
Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures in the application being executed. Performance-tuning tools are essential for exposing these performance failures and for suggesting ways to improve program performance. In this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs.
暂无评论