Large Language Models have significantly advanced software engineering, enabling tasks like code comprehension and fault detection. However, their ability to detect complex bugs, such as data races in parallel program...
详细信息
Large Language Models have significantly advanced software engineering, enabling tasks like code comprehension and fault detection. However, their ability to detect complex bugs, such as data races in parallel programming, remains uncertain. Fault detection in parallel programming (Pthreads) requires a deep understanding of thread-based logic, as data races occur when threads access shared data concurrently without proper synchronization. This paper explores ChatGPT's potential in Pthreads fault detection by addressing three questions: (1) Can ChatGPT effectively debug parallel programming threads? (2) How can dialogue assist with the detection of faults? (3) How can prompt engineering help to improve ChatGPT's fault detection performance?. We examine advanced prompt engineering techniques, such as Zero-Shot, Few-Shot, Chain-of-Thought, and Retrieval-Augmented Generation prompts. Additionally, we introduce three hybrid prompting techniques to enhance performance, including Chain-of-Thought with Few-Shot Prompting, Retrieval-Augmented Generation with Few-Shot Prompting, and Prompt Chaining with Few-Shot Prompting, while evaluating ChatGPT's strengths and limitations for data race detection.
The increase of the number of multiprocessor cores gives the great opportunity for improvement of the parallel application performance. However, the programmer faces significant challenges. The raise of the work distr...
详细信息
Matrix multiplication is a fundamental operation in engineering computations. With the widespread use of modern multi-core processors, this operation can be significantly accelerated through parallel programming. Cons...
详细信息
Failures in Task-based parallel programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient me...
详细信息
This paper contributes to the solution of several open problems with parallel programming tools and their integration with performance evaluation environments. First, we propose interactive compilation scenarios inste...
详细信息
This paper contributes to the solution of several open problems with parallel programming tools and their integration with performance evaluation environments. First, we propose interactive compilation scenarios instead of the usual black-box-oriented use of compiler tools. In such scenarios, information gathered by the compiler and the compiler's reasoning are presented to the user in meaningful ways and on-demand. Second, a tight integration of compilation and performance analysis tools is advocated. Many of the existing, advanced instruments for gathering performance results are being used in the presented environment and their results are combined in integrated views with compiler information and data from other tools. Initial instruments that assist users in "data mining" this information are presented and the need for much stronger facilities is explained. The URSA Family provides two tools addressing these issues. URSA MINOR supports a group of users at a specific site, such as a research or development project. URSA MAJOR complements this tool by making available the gathered results to the user community at large via the World-wide Web. This paper presents objectives, functionality, experience, and next development steps of the URSA tool family. Two case studies are presented that illustrate the use of the tools for developing and studying parallel applications and for evaluating parallelizing compilers.
Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architect...
详细信息
Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architectures consisting of multi-core processors and some kind of accelerator such as GPUs or DSPs. Specific code adapted to the architecture is mandatory if high-performance computation is required and low-level libraries and parallelism are needed, which constitutes an important barrier for the usual developer in such devices. In this context, we propose the FancyJCL framework. It provides a high-level abstraction layer that hides implementation details and allows to develop parallel programs for mobile devices. The target platform for FancyJCL is mainly Android and Java developers due to their high market penetration. A very simple, seemingly sequential encoding results in parallel efficient OpenCL code. FancyJCL is itself based on the Fancier framework, which enables optimal memory management across memory spaces on unified memory systems. Benchmarks of FancyJCL code developed for a wide range of image processing algorithms show good performance with low development effort.
The two major design approaches taken to build distributed and parallel computer systems, multiprocessing and multicomputing, are discussed. A model that combines the best properties of both multiprocessor and multico...
详细信息
The two major design approaches taken to build distributed and parallel computer systems, multiprocessing and multicomputing, are discussed. A model that combines the best properties of both multiprocessor and multicomputer systems, easy-to-build hardware, and a conceptually simple programming model is presented. Using this model, a programmer defines and invokes operations on shared objects, the runtime system handles reads and writes on these objects, and the reliable broadcast layer implements indivisible updates to objects using the sequencing protocol. The resulting system is easy to program, easy to build, and has acceptable performance on problems with a moderate grain size in which reads are much more common than writes. Orca, a procedural language whose sequential constructs are roughly similar to languages like C or Modula 2 but which also supports parallel processes and shared objects and has been used to develop applications for the prototype system, is described
The authors discuss methods for expressing and tuning the performance of parallel programs, using two programming models in the same program: distributed and shared memory. Such methods are important for anyone who us...
详细信息
The authors discuss methods for expressing and tuning the performance of parallel programs, using two programming models in the same program: distributed and shared memory. Such methods are important for anyone who uses these large machines for parallel programs as well as for those who study combinations of the two programming models.
Distributed systems are an alternative to shared-memory multiprocessors for the execution of parallel applications. PANDA is a run-time system that provides architectural support for efficient parallel and distributed...
详细信息
Distributed systems are an alternative to shared-memory multiprocessors for the execution of parallel applications. PANDA is a run-time system that provides architectural support for efficient parallel and distributed programming. It supplies fast user-level threads and a means for transparent and coordinated sharing of objects across a homogeneous network. The paper motivates the major architectural choices that guided our design. The problem of sharing data in a distributed environment is discussed, and the performance of the mechanisms provided by the PANDA prototype implementation is assessed.
Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applic...
详细信息
Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applications running on desktops, workstations, and mobile devices. While parallel hardware has become common, software that exploits parallel capabilities is just beginning to take hold. Multimedia applications, with their data parallel nature and large computing requirements will benefit significantly from parallel programming. In this paper an overview of parallel programming is presented and languages and tools for parallel programming such as OpenMP and CUDA are introduced within the scope of multimedia applications.
暂无评论