In the simultaneous localization of multiple software faults, a parallel debugging approach has consistently been utilized. The effectiveness of a parallel debugging approach is critically determined by the type of cl...
详细信息
In the simultaneous localization of multiple software faults, a parallel debugging approach has consistently been utilized. The effectiveness of a parallel debugging approach is critically determined by the type of clustering algorithm and the distance metric used. However, clustering algorithms that group failed tests based on their execution profile similarity with distance metrics such as Euclidean distance, Jaccard distance, and Hamming distance are considered to be problematic and not appropriate. In this paper, we conducted an investigative study on the usefulness of the problematic parallel debugging approach that makes use of k-means clustering algorithm (that groups failed tests based on their execution profile similarity) with Euclidian distance metric on three similarity coefficient-based fault localization techniques in terms of localization effectiveness. Secondly, we compare the effectiveness of the problematic parallel debugging approach with one-bug-at-a-time debugging approach (OBA) and a state-of-the-art parallel debugging approach named MSeer. The empirical evaluation is conducted on 540 multiple-fault versions of eight medium-sized to large-sized subject programs with two, three, four, and five faulty versions. Our results suggest that clustering failed tests based on their execution profile similarity and the utilization of distance metrics such as Euclidean distance is indeed problematic and contributes to the reduction of effectiveness in localizing multiple faults.
debugging large-scale parallel applications is challenging. In most HPC applications, parallel tasks progress in a coordinated fashion, and thus a fault in one task can quickly propagate to other tasks, making it diff...
详细信息
ISBN:
(纸本)9781450327848
debugging large-scale parallel applications is challenging. In most HPC applications, parallel tasks progress in a coordinated fashion, and thus a fault in one task can quickly propagate to other tasks, making it difficult to debug. Finding the least-progressed tasks can significantly reduce the effort to identify the task where the fault originated. However, existing approaches for detecting them suffer low accuracy and large overheads;either they use imprecise static analysis or are unable to infer progress dependence inside loops. We present a loop-aware progress-dependence analysis tool, PRODOMETER, which determines relative progress among parallel tasks via dynamic analysis. Our fault-injection experiments suggest that its accuracy and precision are over 90% for most cases and that it scales well up to 16,384 MPI tasks. Further, our case study shows that it significantly helped diagnosing a perplexing error in MPI, which only manifested at large scale.
debugging can help programmers to locate the reason for incorrect program behavior parallel programs' executions are much more complex than those of serial ones, which make it difficult to debug parallel programs....
详细信息
ISBN:
(纸本)1932415262
debugging can help programmers to locate the reason for incorrect program behavior parallel programs' executions are much more complex than those of serial ones, which make it difficult to debug parallel programs. In contrast with traditional parallel and distributed computing environments, some new characteristics, such as largely heterogeneous and dynamic, security, and etc, appear in computational grids. These new features challenge debugging grid applications. In this paper we design and implement a grid-enabled parallel debugging environment to simply debug grid applications. We present the concept of ad hoc computing environment in grids and the method by which this environment can be built automatically constrained to MPI-G2 application. Some capabilities, including user identification, automatic task submission, resource registering and collection can be accomplished easily through the portal. The debugging functionalities include consistent global state and race detection.
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex data structures. This paper discusses a novel debug-time assertion, called a "Statistical Assertion", that...
详细信息
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex data structures. This paper discusses a novel debug-time assertion, called a "Statistical Assertion", that allows a user to reason about large data structures, and the primitives are parallelised to provide an efficient solution. We present the design and implementation of statistical assertions, and illustrate the debugging technique with a molecular dynamics simulation. We evaluate the performance of the tool on a 12,000 cores Cray XE6.
The clustering technique has attracted a lot of attention as a promising strategy for parallel debugging in multi-fault scenarios, this heuristic approach (i.e., failure indexing or fault isolation) enables developers...
详细信息
The clustering technique has attracted a lot of attention as a promising strategy for parallel debugging in multi-fault scenarios, this heuristic approach (i.e., failure indexing or fault isolation) enables developers to perform multiple debugging tasks simultaneously through dividing failed test cases into several disjoint groups. When using statement ranking representation to model failures for better clustering, several factors influence clustering effectiveness, including the risk evaluation formula (REF), the number of faults (NOF), the fault type (FT), and the number of successful test cases paired with one individual failed test case (NSP1F). In this paper, we present the first comprehensive empirical study of how these four factors influence clustering effectiveness. We conduct extensive controlled experiments on 1060 faulty versions of 228 simulated faults and 141 real faults, and the results reveal that: (1) GP19 is highly competitive across all REFs, (2) clustering effectiveness decreases as NOF increases, (3) higher clustering effectiveness is easier to achieve when a program contains only predicate faults, and (4) clustering effectiveness remains when the scale of NSP1F is reduced to 20%. (c) 2022 Elsevier Inc. All rights reserved.
The Eclipse parallel Tools Platform (PTP) is an open source Integrated Development Environment (IDE) aiding the development of Supercomputer applications. The PTP parallel debugger is used by a growing community of de...
详细信息
The Eclipse parallel Tools Platform (PTP) is an open source Integrated Development Environment (IDE) aiding the development of Supercomputer applications. The PTP parallel debugger is used by a growing community of developers in scientific and engineering fields. This paper proposes a method of improving the communication infrastructure of the PTP debugger by taking advantage of a Scalable parallel debugging Library (SPDL). Unlike the present communication framework of PTP, the Scalable Debug Manager (SDM), SPDL provides a pluggable architecture that allows developers to select a communication protocol suitable for a targeted supercomputer. It currently supports a number of scalable protocols, including MRNet and SCI. The advanced features provided by these communication trees, like programmable filters and configurable topologies, allow developers to create more flexible solutions of efficient reduction and aggregation operations for parallel debugging. In particular, they allow parallel debuggers to handle the large amounts of back-end messages in peta-scale environments with better efficiency. The architecture of the PTP debugger is extended to support SPDL. The extended architecture combines the advantages of the PTP debugger at the front-end and SPDL at the back-end. It improves the scalability and performance of the PTP debugger. Consequently, it provides a flexible option of utilizing the PTP debugger with pluggable communication protocols to address the debugging challenges in peta-scale environments.
The behaviour of programs for multiprocessors may be indeterminate, due to processor timing variations. This poses a problem for cyclic debugging, since a bug may disappear from one execution to another. Replay is an ...
详细信息
The behaviour of programs for multiprocessors may be indeterminate, due to processor timing variations. This poses a problem for cyclic debugging, since a bug may disappear from one execution to another. Replay is an elegant solution to this problem, in which 'sufficient' information is recorded in a log. This information is then used to control subsequent executions of the same program so that repeatability is guaranteed. Interrupts are another source of non-determinism, even in sequential programs. This paper presents an extension of the well-known Instant Replay method, termed Interrupt Replay, for replaying programs in the presence of interrupts. The correctness of Interrupt Replay is based on the assumption that there are no interrupt races: an interrupt service routine must not access data that is also accessed by the foreground process whenever the interrupt is enabled. If such races are present then replay may fail to produce deterministic results. This assumption is similar to the basic assumption of Instant Replay that shared variables are properly protected by mutual exclusion. Also as in Instant Replay, it is assumed that the behaviour of the environment (input data, external interrupts) is replayed by some other tracing mechanism.
Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migr...
详细信息
ISBN:
(纸本)9781450337236
Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs accelerators or coprocessors. In this paper we extend relative debugging to support porting stencil computation on a hybrid computer. We describe a generic data model that allows programmers to examine the global state across different types of applications, including MPI/OpenMP, MPI/OpenACC, and UPC programs. We present case studies using a hybrid version of the 'stellarator' particle simulation DELTA5D, on Titan at ORNL, and the UPC version of Shallow Water Equations on Crystal, an internal supercomputer of Cray. These case studies used up to 5,120 GPUs and 32,768 CPU cores to illustrate that the debugger is effective and practical.
During program testing, software programs may be discovered to contain multiple faults. Multiple faults in a program may reduce the effectiveness of the existing fault localization techniques due to the complex relati...
详细信息
During program testing, software programs may be discovered to contain multiple faults. Multiple faults in a program may reduce the effectiveness of the existing fault localization techniques due to the complex relationship between faults and failures in the presence of multiple faults. In an ideal case, faults are isolated into fault-focused clusters, each targeting a single fault for developers to localize them simultaneously in parallel. However, the relationship between faults and failures is not easily identified and depends solely on the accuracy of clustering, such as existing clustering algorithms are not able to isolate failed tests to their causative faults effectively which hinder localization effectiveness. This paper proposes a new approach that makes use of a divisive network community clustering algorithm to isolate faults into separate fault-focused communities that target a single fault each. A community weighting and a selection mechanism that aids in prioritizing highly important fault-focused communities to the available developers to debug the faults simultaneously in parallel is also proposed. The approach is evaluated on eight subject programs ranging from medium-sized to large-sized programs (tcas, replace, gzip, sed, flex, grep, make, and ant). Overall, 540 multiple-fault versions of these programs were generated with 2-5 faulty versions. The experimental results have demonstrated that the proposed approach performs significantly better in terms of localization effectiveness in comparison with two other parallel debugging approaches for locating multiple faults in parallel.
This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digi...
详细信息
This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.
暂无评论