Data races are one of the most difficult types of bugs in concurrent multithreaded systems. It requires significant time and cost to accurately detect bugs in complex largescale programs. Although many race detection ...
详细信息
Data races are one of the most difficult types of bugs in concurrent multithreaded systems. It requires significant time and cost to accurately detect bugs in complex largescale programs. Although many race detection techniques have been proposed by various researchers, none of them are effective in all aspects. In this paper, we compare the performance of five recent dynamic race detection techniques: FastTrack, Acculock, Multilock-HB, SimpleLock+, and causally precedes (CP) detection. We experimentally demonstrate the strengths and weaknesses of these dynamic race detection techniques in terms of their detection capability, running time, and runtime overhead using 20 benchmark programs with different characteristics. The comparison results show that the detection capability of CP detection does not differ from that of FastTrack, and that SimpleLock+ generates the lowest overhead among the hybrid detection techniques (Acculock, SimpleLock+, and Multilock-HB) for all benchmark programs. SimpleLock+ is 1.2 times slower than FastTrack on average, but misses one true data race reported from Mutilock-HB on the large-scale benchmark programs.
A computational model is a computer program, which attempts to simulate an abstract model of a particular system. Computational models use enormous calculations and often require supercomputer speed. As personal compu...
详细信息
A computational model is a computer program, which attempts to simulate an abstract model of a particular system. Computational models use enormous calculations and often require supercomputer speed. As personal computers are becoming more and more powerful, more laboratory experiments can be converted into computer models that can be interactively examined by scientists and students without the risk and cost of the actual experiments. The future of programming is concurrent programming. The threaded programming model provides application programmers with a useful abstraction of concurrent execution of multiple tasks. The objective of this release is to address the design of architecture for scientific application, which may execute as multiple threads execution, as well as implementations of the related shared data structures. New version program summary Program title: GrowthCP Catalogue identifier: ADVL_v4_0 Program summary URL: http://***/summaries/ADVL_v4_*** Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://***/*** No. of lines in distributed program, including test data, etc.: 32 269 No. of bytes in distributed program, including test data, etc.: 8 234 229 Distribution format: *** programming language: Free Object Pascal Computer: multi-core x64-based PC Operating system: Windows XP, Vista, 7 Has the code been vectorised or parallelized?: No RAM: More than 1 GB. The program requires a 32-bit or 64-bit processor to run the generated code. Memory is addressed using 32-bit (on 32-bit processors) or 64-bit (on 64-bit processors with 64-bit addressing) pointers. The amount of addressed memory is limited only by the available amount of virtual memory. Supplementary material: The figures mentioned in the "Summary of revisions" section can be obtained here. Classification: 4.3, 7.2, 6.2, 8, 14 External routines: Lazarus [1] Catalogue
multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article des...
详细信息
multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article describes a new tool, called Eraser, for dynamically detecting data races in lock-based multithreaded programs. Eraser uses binary rewriting techniques to monitor every shared-memory reference and verify that consistent locking behavior is observed. We present several case studies, including undergraduate coursework and a multithreaded Web search engine, that demonstrate the effectiveness of this approach.
Many image processing applications need real-time performance, while having restrictions of size, weight and power consumption. Common solutions, including hardware/software co-designs, are based on Field Programmable...
详细信息
Many image processing applications need real-time performance, while having restrictions of size, weight and power consumption. Common solutions, including hardware/software co-designs, are based on Field Programmable Gate Arrays (FPGAs). Their main drawback is long development time. In this work, a co-design methodology for processor-centric embedded systems with hardware acceleration using FPGAs is proposed. The goal of this methodology is to achieve real-time embedded solutions, using hardware acceleration, but achieving development time similar to that of software projects. Well established methodologies, techniques and languages from the software domain-such as Object-Oriented Paradigm design, Unified Modelling Language, and multithreading programming-are applied;and semiautomatic C-to-HDL translation tools and methods are used and compared. The methodology is applied to achieve an embedded implementation of a global vision algorithm for the localization of multiple robots in an e-learning robotic laboratory. The algorithm is specifically developed to work reliably 24/7 and to detect the robot's positions and headings even in the presence of partial occlusions and varying lighting conditions expectable in a normal classroom. The co-designed implementation of this algorithm processes 1,600 9 1,200 pixel images at a rate of 32 fps with an estimated energy consumption of 17 mJ per frame. It achieves a 169 acceleration and 92 % energy saving, which compares favorably with the most optimized embedded software solutions. This case study shows the usefulness of the proposed methodology for embedded real-time image processing applications.
Parallel algorithms are problematic to develop because of the negative influence of synchronisation, complicated behaviour of threads' capturing computing resources. Experimental results show performance time'...
详细信息
Parallel algorithms are problematic to develop because of the negative influence of synchronisation, complicated behaviour of threads' capturing computing resources. Experimental results show performance time's strong dependence on algorithm parameters, such as the number of subtasks and the complexity of each task. The optimal value of subtask complexity is revealed for the particular algorithm. It is the same for different complexity of the parallelised task (with the same computing resource). To guarantee algorithm speed-up it is important to have a method for investigating the efficiency of parallel algorithm before its implementation on specified computing resources. Stochastic Petri net potentially could be a high accuracy tool for investigating the efficiency of a parallel algorithm. However, a huge number of elements are needed to compose a model of non-trivial algorithm that limits the application of this tool in practice. Petri-object simulation method allows replication of Petri nets with specified parameters and model creation of a list of linked Petri-objects. Basic templates for the model creation of a multithreaded algorithm are developed. Applying these templates, the model of the parallel discrete event simulation algorithm is developed and investigated. By the model results, the algorithm parameters providing the least performance time can be determined.
The paper discusses several theoretical and implementational problems of interval branch-and-bound methods. A trial to define a class of problems that can be solved with such methods is done. Features and variants of ...
详细信息
The paper discusses several theoretical and implementational problems of interval branch-and-bound methods. A trial to define a class of problems that can be solved with such methods is done. Features and variants of the method are presented. Useful data structures and shared-memory parallelization issues are considered.
Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication band...
详细信息
Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication bandwidth, so that a better performance can be obtained. However, the cost for a multiple-CPU server is high and therefore, the server is usually shared among many users. The work-load due to other users will certainly affect the performance of the parallel programs so it is desirable to derive a method to optimize parallel programs under different loading conditions. In this paper, we present a simple method, which can be applied in SPMD type parallel programs, to improve the speedup by controlling the number of threads within the programs. (C) 2001 Elsevier Science B.V. All rights reserved.
This paper presents the course development activities on multicore programming at the Electrical and Computer Engineering Department of Virginia Commonwealth University. As multicore processors have become the main st...
详细信息
ISBN:
(纸本)9781612844695
This paper presents the course development activities on multicore programming at the Electrical and Computer Engineering Department of Virginia Commonwealth University. As multicore processors have become the main stream computing platform, it becomes a necessity to teach undergraduate on programming for multicore processors. This paper gives details information about the multicore programming course developed at VCU, including the course modules and a brief introduction of the labs.
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many fals...
详细信息
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained precision but incurred significant overheads in the range of 3x to 30x. In contrast, our approach results in very few false positives and runtime overhead in the 13% to 42% range, making it both efficient and precise. This performance improvement is the result of a unique combination of complementary static and dynamic optimization techniques.
The Transactional Memory model was proposed as a mechanism offering a higher-level programming interface to abstract some of the complexities associated with simultaneous access to shared data. Although modern tools f...
详细信息
ISBN:
(纸本)9781450390620
The Transactional Memory model was proposed as a mechanism offering a higher-level programming interface to abstract some of the complexities associated with simultaneous access to shared data. Although modern tools for multithreaded programming offer resources, such as programming interface and scheduling facilities, for efficient hardware exploitation, the support for shared data synchronization still reflects classic critical section-based models. This work proposes an extension to the OpenMP, a de facto standard for multithread programming, offering the Transaction Memory model. Different from other approaches found in literature to extend OpenMP with Transaction Memory, we propose an interface that not only promotes the access to a Transaction Memory but also reflects the OpenMP programming style. A specification of the OpenMP extension is presented, and a prototype implementation is evaluated with the help of transactional memory tools in software: the TinySTM library and the TM support offered by the GNU C Compiler (GCC). The proposed interface and its prototype are presented, in the form of an intermediate language, Vanilla-TM, and the interface validation was performed based on the analysis of the results obtained. These results point to the viability of incorporate the proposed extension in an OpenMP dialect, as well as the analysis of the experiments allowed us to conclude that the policies applied for TM management are decisive for a good performance of the programs.
暂无评论