Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architect...
详细信息
Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architectures consisting of multi-core processors and some kind of accelerator such as GPUs or DSPs. Specific code adapted to the architecture is mandatory if high-performance computation is required and low-level libraries and parallelism are needed, which constitutes an important barrier for the usual developer in such devices. In this context, we propose the FancyJCL framework. It provides a high-level abstraction layer that hides implementation details and allows to develop parallel programs for mobile devices. The target platform for FancyJCL is mainly Android and Java developers due to their high market penetration. A very simple, seemingly sequential encoding results in parallel efficient OpenCL code. FancyJCL is itself based on the Fancier framework, which enables optimal memory management across memory spaces on unified memory systems. Benchmarks of FancyJCL code developed for a wide range of image processing algorithms show good performance with low development effort.
Large Language Models have significantly advanced software engineering, enabling tasks like code comprehension and fault detection. However, their ability to detect complex bugs, such as data races in parallel program...
详细信息
Large Language Models have significantly advanced software engineering, enabling tasks like code comprehension and fault detection. However, their ability to detect complex bugs, such as data races in parallel programming, remains uncertain. Fault detection in parallel programming (Pthreads) requires a deep understanding of thread-based logic, as data races occur when threads access shared data concurrently without proper synchronization. This paper explores ChatGPT's potential in Pthreads fault detection by addressing three questions: (1) Can ChatGPT effectively debug parallel programming threads? (2) How can dialogue assist with the detection of faults? (3) How can prompt engineering help to improve ChatGPT's fault detection performance?. We examine advanced prompt engineering techniques, such as Zero-Shot, Few-Shot, Chain-of-Thought, and Retrieval-Augmented Generation prompts. Additionally, we introduce three hybrid prompting techniques to enhance performance, including Chain-of-Thought with Few-Shot Prompting, Retrieval-Augmented Generation with Few-Shot Prompting, and Prompt Chaining with Few-Shot Prompting, while evaluating ChatGPT's strengths and limitations for data race detection.
Research on mental model representations developed by programmers during parallel program comprehension is important for informing and advancing teaching methods including model-based learning and visualizations. The ...
详细信息
Research on mental model representations developed by programmers during parallel program comprehension is important for informing and advancing teaching methods including model-based learning and visualizations. The goals of the research presented here were to determine: how the mental models of programmers change and develop as they learn parallel programming, the quality of their mental models after learning parallel programming, and what type of information is part of their mental models when examining code for the presence of data races. Participants were experienced C programmers and included both university students and professionals. The mental models of participants were analyzed by having them perform a code tracing task where they externalized their mental models by drawing diagrams while tracing the execution of parallel code. We also analyzed their mental models by having participants determine the presence of data races in parallel code and then answer multiple choice and open-ended questions related to the code. The results presented in this article indicate that programmers' mental models progress from a weaker execution model and a stronger situation model before learning parallel programming, to a stronger execution model and a weaker situation model after learning parallel programming. The thematic analysis of the openended responses that indicate what components of code programmers used to determine whether or not a data race was present provides insight into the topics that should be emphasized when teaching parallel programming.
Given the ubiquity of parallel computing hardware, we introduced parallelprogramming with pictures to the block-based Snap! environment and called it pSnap!, short for parallel Snap! We then created an accessible curr...
详细信息
ISBN:
(纸本)9798350311990
Given the ubiquity of parallel computing hardware, we introduced parallelprogramming with pictures to the block-based Snap! environment and called it pSnap!, short for parallel Snap! We then created an accessible curriculum for students of all ages to learn how to program serially and then how to program with explicit parallelism. This paper presents a new and innovative extension to our curriculum on parallel programming with pSnap!, one that broadens its appeal to the masses by teaching the application of parallel programming as a "choose your own learning adventure" activity, inspired by the Choose Your Own Adventure book series of the 1980s and 1990s. Specifically, after students learn the basics of parallel programming with pictures, they are ready to choose their next learning adventure, which applies their newfound parallel programming skills to create a video game of their choice, i.e., Missile Command or Do You Want to Build a Snowman?
parallel programming offers the ability to simultaneously improve the performance and reduce the energy consumption of software running on heterogeneous computing systems. Software developers have long preferred to av...
详细信息
parallel programming offers the ability to simultaneously improve the performance and reduce the energy consumption of software running on heterogeneous computing systems. Software developers have long preferred to avoid parallel programming, if possible, for reasons such as perceived difficulty, lack of portability between systems, and the pace of improvement in computer hardware. However, generational changes in computer hardware are now focused on specialized components and increased computational cores, and the continued evolution of these systems places increased emphasis on achieving improvements via the use of these components. This thesis investigates parallel programming techniques that make use of components common to modern heterogeneous systems, and proposes that the difficulty and lack of portability need not be barriers to large improvements. Using a variety of heterogeneous systems, algorithms were implemented and then transformed using multiple cores, SIMD execution units, and GPUs. Reductions in execution time ranging from 71-94% and energy consumption of 76-98% were observed, demonstrating the effectiveness of using specialized components for improved performance and reduced energy consumption.
The development of real-time systems is one of the areas with the highest relevance in computer science, and the number of critical systems has increased significantly. These systems considers several applications run...
详细信息
In this work, the implementation of an efficient multi-threading algorithm for calculating the power flow in electricity distribution networks is carried out using recursion and parallel programming. With the integrat...
详细信息
ISBN:
(纸本)9798350387032;9798350387025
In this work, the implementation of an efficient multi-threading algorithm for calculating the power flow in electricity distribution networks is carried out using recursion and parallel programming. With the integration of renewable energy, energy storage systems and distributed generation, the ability of power flow simulations becomes a crucial factor in finding the best solution in the shortest possible time. We propose the direct use of graph theory to represent distribution network topologies. In this data structure, the traversal algorithms are inherently recursive, thus enabling the development of algorithms with parallel programming to obtain the power flow calculation faster and more efficiently. Results under a 809 buses test system show that the implementation provides additional computation efficiency of 32% with recursion techniques and 27% with parallel programming, due the expense of threads' allocation the combined gain reaches 50%.
Understanding the performance behavior of parallel applications is important in many ways, but doing so is not easy. Most open source analysis tools are written for the command line. We are building on these proven to...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Understanding the performance behavior of parallel applications is important in many ways, but doing so is not easy. Most open source analysis tools are written for the command line. We are building on these proven tools to provide an interactive performance analysis experience within Jupyter Notebooks when developing parallel code with MPI, OpenMP, or both. Our solution makes it possible to measure the execution time, perform profiling and tracing, and visualize the results within the notebooks. For ease of use, it provides both a graphical JupyterLab extension and a C++ API. The JupyterLab extension shows a dialog where the user can select the type of analysis and its parameters. Internally, this tool uses Score -P, Scalasca, and Cube to generate profiling and tracing data. This tight integration gives students easy access to profiling tools and helps them better understand concepts such as benchmarking, scalability and performance bottlenecks. In addition to the technical development, the article presents hands-on exercises from our well-established parallel programming course. We conclude with a qualitative and quantitative evaluation with 19 students, which shows a positive effect of the tools on the students' perceived competence.
The use of modern browsers reveals itself more and more essential to the world. Features like Web Workers are becoming more adopted over the most used browsers of the Internet, enabling performance enhancements in web...
详细信息
ISBN:
(纸本)9789819735556;9789819735563
The use of modern browsers reveals itself more and more essential to the world. Features like Web Workers are becoming more adopted over the most used browsers of the Internet, enabling performance enhancements in web applications. As consequence, execution of tasks with higher computational demand inside the browser. Technique of task parallelization using Web Workers, presenting as study case an algorithm of crossword generation, being executed in a browser context. The results show even superlinear speedups for a parallel version of the algorithm.
Matrix multiplication is a fundamental operation in engineering computations. With the widespread use of modern multi-core processors, this operation can be significantly accelerated through parallel programming. Cons...
详细信息
暂无评论