This article reports the use of case studies to evaluate the performance degradation caused by the kernel-level lock. We define the lock ratio as a ratio of the execution time for critical sections to the total execut...
详细信息
This article reports the use of case studies to evaluate the performance degradation caused by the kernel-level lock. We define the lock ratio as a ratio of the execution time for critical sections to the total execution time of a parallel program. The kernel-level lack ratio determines how effective programs work on symmetric multiprocessor (SMP) systems. me have measured the lock ratios and the performance of three types of parallel programs on SMP systems with Linux 2.0: matrix multiplication, parallel make, and WWW server programs. Experimental results show that the higher the lock ratio of parallel programs, the worse their performance becomes. Copyright (C) 2001 John Wiley & Sons, Ltd.
A new notion of input/output equivalence of distributed imperative programs, with synchronous communications, is introduced. It preserves the input/output relation, encompassing both, initial/final state and communica...
详细信息
A new notion of input/output equivalence of distributed imperative programs, with synchronous communications, is introduced. It preserves the input/output relation, encompassing both, initial/final state and communication channel values. For its mathematical justification, the semantic framework of Manna and Pnueli, based on finite transition systems and reduced behaviors, is extended with the notion of input/output behavior. A set of laws for the equivalence is overviewed. A deduction rule for the substitution of references to input/output equivalent procedures is defined and justified in the new semantics. The rule is applied to decompose distributed program simplification proofs, introduced in a prior work, which use the laws to establish the equivalence between a sequential and a parallel communicating program. They include communication elimination as one of their steps. An outline of one of such proofs, for a pipelined processor model, is included.
Most control systems of flexible production cells have a hierarchical structure. They become very complicated and difficult to maintain and modify when the underlying production cells grow in size and complexity. More...
详细信息
Most control systems of flexible production cells have a hierarchical structure. They become very complicated and difficult to maintain and modify when the underlying production cells grow in size and complexity. Moreover, they are characterized by a relatively high sensitivity to failures. As opposed to that, heterarchical control systems are flexible, modular, easy to modify, and — to some extent — faulttolerant. In this paper, a heterarchical control system of a flexible production cell is formally specified in the CSP-based language χ. This language is well suited for the description of autonomous components cooperating with each other by exchanging information.
A new approach for parallel partitioning and placement of standard cells for ULSI has been proposed. It is based on the well-known min-cut algorithm and uses a partitioning strategy which is oriented to minimise the n...
详细信息
A new approach for parallel partitioning and placement of standard cells for ULSI has been proposed. It is based on the well-known min-cut algorithm and uses a partitioning strategy which is oriented to minimise the number of nets crossing the cutting lines with even cell distribution over the chip area. It was implemented as a CAD software tool "SOCRAT", based on SUN SPARCstation and PARSYTEC powerXplorer. The results of developed tool evaluation based on the MCNC International standard benchmarks proved that SOCRAT's runtime for VLSI is up to 4-8 times faster than CADENCE's one using a simulated annealing algorithm, and for high complexity ULSI it is expected to be much faster.
Switched reluctance motors (SRM) are an inherent part in robotics and automation systems where energy and cost efficiency is required. This motor type has no windings and permanent magnets on the rotor which results i...
详细信息
Switched reluctance motors (SRM) are an inherent part in robotics and automation systems where energy and cost efficiency is required. This motor type has no windings and permanent magnets on the rotor which results in a simple and robust structure. However, SRMs require a complex electronic control system to generate a specified number of voltage pulses for each motor phase. This paper presents the signal generation of multiple phases using only one current sensor in an asymmetric half bridge (AHB). In addition to maintain the predetermined phase voltages, sufficient current measurement windows and a minimal current ripple for the individual phases are further optimization criteria for signal generation. The generation of a state vector which controls the individual semiconductor for each motor phase to achieve a required phase voltage and simultaneously fulfill the multi-objective optimization criteria is challenging. Due to the vast number of possible solutions, a genetic algorithm (GA) was used to find state combinations that are suitable for the formulated optimization criteria. The results were discussed and recommendations about the genotype representation and the used genetic operators were given. Interested readers will find detailed information about the software technical implementation using the Global Optimization Toolbox from MATLAB.
This paper presents a method for detecting deadlocks in parallel system through a special class of Petri Nets that we call E-S 3 PR. Firstly, a compositional method is illustrated for modeling the concurrent execution...
详细信息
This paper presents a method for detecting deadlocks in parallel system through a special class of Petri Nets that we call E-S 3 PR. Firstly, a compositional method is illustrated for modeling the concurrent execution of sequential programs in parallel system through E-S 3 PR. Then the analysis of a class of E-S 3 PR called nonerror E-S 3 PR leads us to characterize deadlock situations in terms of a zero marking for some structural objects called siphons. Finally, an on-line algorithm is given for detecting deadlocks in a nonerror parallel system through detecting the presence of unmarked siphons in the E-S 3 PR corresponding to the nonerror parallel system.
A software system has been developed for high-performance Computed Tomography (CT) reconstruction, simulation and other X-ray image processing tasks utilizing remote computer clusters optionally equipped with multiple...
详细信息
ISBN:
(纸本)9780819487513
A software system has been developed for high-performance Computed Tomography (CT) reconstruction, simulation and other X-ray image processing tasks utilizing remote computer clusters optionally equipped with multiple Graphics Processing Units (GPUs). The system has a streamlined Graphical User Interface for interaction with the cluster. Apart from extensive functionality related to X-ray CT in plane-wave and cone-beam forms, the software includes multiple functions for X-ray phase retrieval and simulation of phase-contrast imaging (propagation-based, analyzer crystal based and Talbot interferometry). Other features include several methods for image deconvolution, simulation of various phase-contrast microscopy modes (Zernike, Schlieren, Nomarski, dark-field, interferometry, etc.) and a large number of conventional image processing operations (such as FFT, algebraic and geometrical transformations, pixel value manipulations, simulated image noise, various filters, etc.). The architectural design of the system is described, as well as the two-level parallelization of the most computationally-intensive modules utilizing both the multiple CPU cores and multiple GPUs available in a local PC or a remote computer cluster. Finally, some results about the current system performance are presented. This system can potentially serve as a basis for a flexible toolbox for X-ray image analysis and simulation, that can efficiently utilize modern multi-processor hardware for advanced scientific computations.
Petri Nets have been proved to be an effecient tool to represent complicated systems. Nevertheless, in general it is not easy to implement a technical system given as a Petri Net on a multiprocessor system. This contr...
详细信息
Petri Nets have been proved to be an effecient tool to represent complicated systems. Nevertheless, in general it is not easy to implement a technical system given as a Petri Net on a multiprocessor system. This contribution presents a new approach for this procedure. The main difference compared to other methods is the effective use of message passing communication during the implementation.
Abstract: This paper discusses a new debugging strategy for parallel programs, called parallel relative debugging. Relative debugging allows a user to compare the execution of one program to another, and this can be u...
详细信息
Abstract: This paper discusses a new debugging strategy for parallel programs, called parallel relative debugging. Relative debugging allows a user to compare the execution of one program to another, and this can be used to trace errors. This technique has been found to significantly aid in problem determination. A prototype sequential relative debugger called Guard, has already been constructed and has been used in a number of real world situations. However the control logic it uses is not sufficiently powerful to support the debugging of parallel applications in this paper we describe how dataflow can be used to provide a very rich control mechanism that is well suited to the parallel environment. We illustrate the system by a worked example.
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many fals...
详细信息
ISBN:
(纸本)9781581134636
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained precision but incurred significant overheads in the range of 3x to 30x. In contrast, our approach results in very few false positives and runtime overhead in the 13% to 42% range, making it both efficient and precise. This performance improvement is the result of a unique combination of complementary static and dynamic optimization techniques.
暂无评论