检索结果-内蒙古大学图书馆

parallel debugging: An investigative study

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS 2019年第11期31卷 e2178-e2178页

作者： Zakari, Abubakar Lee, Sai Peck Univ Malaya Fac Comp Sci & Informat Technol Dept Software Engn Kuala Lumpur 50603 Malaysia Kano Univ Sci & Technol Dept Comp Sci PMB 3244 Kano Nigeria

In the simultaneous localization of multiple software faults, a parallel debugging approach has consistently been utilized. The effectiveness of a parallel debugging approach is critically determined by the type of clustering algorithm and the distance metric used. However, clustering algorithms that group failed tests based on their execution profile similarity with distance metrics such as Euclidean distance, Jaccard distance, and Hamming distance are considered to be problematic and not appropriate. In this paper, we conducted an investigative study on the usefulness of the problematic parallel debugging approach that makes use of k-means clustering algorithm (that groups failed tests based on their execution profile similarity) with Euclidian distance metric on three similarity coefficient-based fault localization techniques in terms of localization effectiveness. Secondly, we compare the effectiveness of the problematic parallel debugging approach with one-bug-at-a-time debugging approach (OBA) and a state-of-the-art parallel debugging approach named MSeer. The empirical evaluation is conducted on 540 multiple-fault versions of eight medium-sized to large-sized subject programs with two, three, four, and five faulty versions. Our results suggest that clustering failed tests based on their execution profile similarity and the utilization of distance metrics such as Euclidean distance is indeed problematic and contributes to the reduction of effectiveness in localizing multiple faults.

关键词： multiple faults parallel debugging program debugging program spectra software fault localization

来源：评论

学校读者我要写书评

暂无评论

Accurate Application Progress Analysis for Large-Scale parallel debugging 14

Accurate Application Progress Analysis for Large-Scale Paral...

引用

35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

作者： Mitra, Subrata Laguna, Ignacio Ahn, Dong H. Bagchi, Saurabh Schulz, Martin Gamblin, Todd Purdue Univ W Lafayette IN 47907 USA Lawrence Livermore Natl Lab Livermore CA USA

ISBN: (纸本)9781450327848

debugging large-scale parallel applications is challenging. In most HPC applications, parallel tasks progress in a coordinated fashion, and thus a fault in one task can quickly propagate to other tasks, making it difficult to debug. Finding the least-progressed tasks can significantly reduce the effort to identify the task where the fault originated. However, existing approaches for detecting them suffer low accuracy and large overheads;either they use imprecise static analysis or are unable to infer progress dependence inside loops. We present a loop-aware progress-dependence analysis tool, PRODOMETER, which determines relative progress among parallel tasks via dynamic analysis. Our fault-injection experiments suggest that its accuracy and precision are over 90% for most cases and that it scales well up to 16,384 MPI tasks. Further, our case study shows that it significantly helped diagnosing a perplexing error in MPI, which only manifested at large scale.

关键词： parallel debugging high-performance computing dynamic analysis MPI Performance Algorithms Reliability Measurement

来源：评论

学校读者我要写书评

暂无评论

Grid-enabled parallel debugging environment: A portal solution

Grid-enabled parallel debugging environment: A portal soluti...

引用

International Conference on parallel and Distributed Processing Techniques and Applications

作者： Wang, W Fang, BX Zhang, HL Harbin Inst Technol Dept Comp Sci Res Ctr Comp Network & Informat Secur Technol Harbin Peoples R China

ISBN: (纸本)1932415262

debugging can help programmers to locate the reason for incorrect program behavior parallel programs' executions are much more complex than those of serial ones, which make it difficult to debug parallel programs. In contrast with traditional parallel and distributed computing environments, some new characteristics, such as largely heterogeneous and dynamic, security, and etc, appear in computational grids. These new features challenge debugging grid applications. In this paper we design and implement a grid-enabled parallel debugging environment to simply debug grid applications. We present the concept of ad hoc computing environment in grids and the method by which this environment can be built automatically constrained to MPI-G2 application. Some capabilities, including user identification, automatic task submission, resource registering and collection can be accomplished easily through the portal. The debugging functionalities include consistent global state and race detection.

关键词： parallel debugging grid portal consistent global states race condition

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel debugging with Statistical Assertions

Scalable Parallel Debugging with Statistical Assertions

引用

17th ACM SIGPLAN Symposium on Principles and Practice of parallel Programming

作者： Minh Ngoc Dinh Abramson, David Jin, Chao Gontarek, Andrew Moench, Bob DeRose, Luiz Monash Univ Clayton Vic 3800 Australia Cray Inc St Paul MN 55101 USA

Traditional debuggers are of limited value for modern scientific codes that manipulate large complex data structures. This paper discusses a novel debug-time assertion, called a "Statistical Assertion", that allows a user to reason about large data structures, and the primitives are parallelised to provide an efficient solution. We present the design and implementation of statistical assertions, and illustrate the debugging technique with a molecular dynamics simulation. We evaluate the performance of the tool on a 12,000 cores Cray XE6.

关键词： Performance Verification parallel debugging statistic assertion

来源：评论

学校读者我要写书评

暂无评论

A comprehensive empirical investigation on failure clustering in parallel debugging

引用

JOURNAL OF SYSTEMS AND SOFTWARE 2022年 193卷

作者： Song, Yi Xie, Xiaoyuan Liu, Quanming Zhang, Xihao Wu, Xi Wuhan Univ Sch Comp Sci Wuhan Peoples R China

The clustering technique has attracted a lot of attention as a promising strategy for parallel debugging in multi-fault scenarios, this heuristic approach (i.e., failure indexing or fault isolation) enables developers to perform multiple debugging tasks simultaneously through dividing failed test cases into several disjoint groups. When using statement ranking representation to model failures for better clustering, several factors influence clustering effectiveness, including the risk evaluation formula (REF), the number of faults (NOF), the fault type (FT), and the number of successful test cases paired with one individual failed test case (NSP1F). In this paper, we present the first comprehensive empirical study of how these four factors influence clustering effectiveness. We conduct extensive controlled experiments on 1060 faulty versions of 228 simulated faults and 141 real faults, and the results reveal that: (1) GP19 is highly competitive across all REFs, (2) clustering effectiveness decreases as NOF increases, (3) higher clustering effectiveness is easier to achieve when a program contains only predicate faults, and (4) clustering effectiveness remains when the scale of NSP1F is reduced to 20%. (c) 2022 Elsevier Inc. All rights reserved.

关键词： Failure clustering Fault isolation Multiple-fault parallel debugging

来源：评论

学校读者我要写书评

暂无评论

Extending the Eclipse parallel Tools Platform Debugger with Scalable parallel debugging Library

引用

Procedia Computer Science 2013年 18卷 1774-1783页

作者： Chao Jin Liang Ding David Abramson Faculty of Information Technology Monash University Clayton VIC 3168 Australia

The Eclipse parallel Tools Platform (PTP) is an open source Integrated Development Environment (IDE) aiding the development of Supercomputer applications. The PTP parallel debugger is used by a growing community of developers in scientific and engineering fields. This paper proposes a method of improving the communication infrastructure of the PTP debugger by taking advantage of a Scalable parallel debugging Library (SPDL). Unlike the present communication framework of PTP, the Scalable Debug Manager (SDM), SPDL provides a pluggable architecture that allows developers to select a communication protocol suitable for a targeted supercomputer. It currently supports a number of scalable protocols, including MRNet and SCI. The advanced features provided by these communication trees, like programmable filters and configurable topologies, allow developers to create more flexible solutions of efficient reduction and aggregation operations for parallel debugging. In particular, they allow parallel debuggers to handle the large amounts of back-end messages in peta-scale environments with better efficiency. The architecture of the PTP debugger is extended to support SPDL. The extended architecture combines the advantages of the PTP debugger at the front-end and SPDL at the back-end. It improves the scalability and performance of the PTP debugger. Consequently, it provides a flexible option of utilizing the PTP debugger with pluggable communication protocols to address the debugging challenges in peta-scale environments.

关键词： parallel debugging Scalability

来源：评论

学校读者我要写书评

暂无评论

INTERRUPT REPLAY - A debugging METHOD FOR parallel PROGRAMS WITH INTERRUPTS

引用

MICROPROCESSORS AND MICROSYSTEMS 1994年第10期18卷 601-612页

作者： AUDENAERT, KMR LEVROUW, LJ STATE UNIV GHENT BELGIAN FUND SCI RESB-9000 GHENTBELGIUM

The behaviour of programs for multiprocessors may be indeterminate, due to processor timing variations. This poses a problem for cyclic debugging, since a bug may disappear from one execution to another. Replay is an elegant solution to this problem, in which 'sufficient' information is recorded in a log. This information is then used to control subsequent executions of the same program so that repeatability is guaranteed. Interrupts are another source of non-determinism, even in sequential programs. This paper presents an extension of the well-known Instant Replay method, termed Interrupt Replay, for replaying programs in the presence of interrupts. The correctness of Interrupt Replay is based on the assumption that there are no interrupt races: an interrupt service routine must not access data that is also accessed by the foreground process whenever the interrupt is enabled. If such races are present then replay may fail to produce deterministic results. This assumption is similar to the basic assumption of Instant Replay that shared variables are properly protected by mutual exclusion. Also as in Instant Replay, it is assumed that the behaviour of the environment (input data, external interrupts) is replayed by some other tracing mechanism.

关键词： parallel debugging INTERRUPTS REPLAY

来源：评论

学校读者我要写书评

暂无评论

Relative debugging for a Highly parallel Hybrid Computer System 15

Relative Debugging for a Highly Parallel Hybrid Computer Sys...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： DeRose, Luiz Gontarek, Andrew Vose, Aaron Moench, Robert Abramson, David Minh Ngoc Dinh Jin, Chao Cray Inc Cray Plaza380 Jackson St St Paul MN 55101 USA Univ Queensland Brisbane Qld Australia

ISBN: (纸本)9781450337236

Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs accelerators or coprocessors. In this paper we extend relative debugging to support porting stencil computation on a hybrid computer. We describe a generic data model that allows programmers to examine the global state across different types of applications, including MPI/OpenMP, MPI/OpenACC, and UPC programs. We present case studies using a hybrid version of the 'stellarator' particle simulation DELTA5D, on Titan at ORNL, and the UPC version of Shallow Water Equations on Crystal, an internal supercomputer of Cray. These case studies used up to 5,120 GPUs and 32,768 CPU cores to illustrate that the debugger is effective and practical.

关键词： parallel debugging Hybrid Programming Scalability

来源：评论

学校读者我要写书评

暂无评论

A Community-Based Fault Isolation Approach for Effective Simultaneous Localization of Faults

引用

IEEE ACCESS 2019年 7卷 50012-50030页

作者： Zakari, Abubakar Lee, Sai Peck Hashem, Ibrahim Abaker Targio Univ Malaya Fac Comp Sci & Informat Technol Kuala Lumpur 50603 Malaysia Kano Univ Sci & Technol Dept Comp Sci Wudil Nigeria Taylors Univ Sch Comp & IT Subang Jaya 47500 Malaysia

During program testing, software programs may be discovered to contain multiple faults. Multiple faults in a program may reduce the effectiveness of the existing fault localization techniques due to the complex relationship between faults and failures in the presence of multiple faults. In an ideal case, faults are isolated into fault-focused clusters, each targeting a single fault for developers to localize them simultaneously in parallel. However, the relationship between faults and failures is not easily identified and depends solely on the accuracy of clustering, such as existing clustering algorithms are not able to isolate failed tests to their causative faults effectively which hinder localization effectiveness. This paper proposes a new approach that makes use of a divisive network community clustering algorithm to isolate faults into separate fault-focused communities that target a single fault each. A community weighting and a selection mechanism that aids in prioritizing highly important fault-focused communities to the available developers to debug the faults simultaneously in parallel is also proposed. The approach is evaluated on eight subject programs ranging from medium-sized to large-sized programs (tcas, replace, gzip, sed, flex, grep, make, and ant). Overall, 540 multiple-fault versions of these programs were generated with 2-5 faulty versions. The experimental results have demonstrated that the proposed approach performs significantly better in terms of localization effectiveness in comparison with two other parallel debugging approaches for locating multiple faults in parallel.

关键词： Complex network multiple faults fault localization fault isolation program debugging parallel debugging

来源：评论

学校读者我要写书评

暂无评论

ADVANCED EDUCATIONAL parallel DSP SYSTEM BASED ON TMS320C25 PROCESSORS

引用

MICROPROCESSORS AND MICROSYSTEMS 1995年第3期19卷 147-156页

作者： KURUGOLLU, F PALAZ, H GUMUSKAYA, H HARMANCI, E ORENCIK, B ULUDAG UNIV DEPT ELECTR ENGNGORUKLETURKEY ISTANBUL TECH UNIV DEPT CONTROL & COMP ENGNMASLAKTURKEY

This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.

关键词： DIGITAL SIGNAL PROCESSING parallel PROCESSING parallel debugging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：