A proper initialization requires starting the process in a state close to the expected steady state. In Web caching, the initialization problem is faced each time a new document enters the cache. Independently of the ...
详细信息
A proper initialization requires starting the process in a state close to the expected steady state. In Web caching, the initialization problem is faced each time a new document enters the cache. Independently of the method used to sort the documents into the cache, the newly referenced document is inserted in a so called "removal-list", from which documents are removed when storage space is needed. Often, undesirable documents are assigned a high priority, consequently these documents remain for quite a long time in the cache, leading to a decrease in cache server performances. We investigate one category of undesirable documents, which pass the filters commonly used to control the cache processing.
We present the design and implementation of MultiKron PCI, a hardware performance monitor that can be plugged into any computer with a free PCI bus slot. The monitor provides a series of high-resolution timers, and th...
详细信息
We present the design and implementation of MultiKron PCI, a hardware performance monitor that can be plugged into any computer with a free PCI bus slot. The monitor provides a series of high-resolution timers, and the ability to monitor the utilization of the PCI bus. We also demonstrate how the monitor can be integrated with online performance monitoring tools such as the Paradyn parallel performance measurement tools to improve the overhead of key timer operations by a factor of 25. In addition, we present a series of case studies using the MultiKron hardware performance monitor to measure and tune high-performance parallel completing applications. By using the monitor, we were able to find and correct a performance bug in a popular implementation of the MPI message passing library that caused some communication primitives to run at one half their potential speed.
By combining properties of fuzzy systems and neural networks, neurofuzzy modelling is ideally suited to many system identification and data modelling applications. Recently, data-driven model construction algorithms h...
详细信息
By combining properties of fuzzy systems and neural networks, neurofuzzy modelling is ideally suited to many system identification and data modelling applications. Recently, data-driven model construction algorithms have been developed to identify these models. These algorithms have proved essential for producing accurate parsimonious models. However, due to problems with sparse data and restricted model structures, models with high model variance are often produced. Thus resulting in models which generalise poorly. In this paper local Bayesian inference techniques are applied to neurofuzzy models, multiple prior probability density functions are placed on the weights and superfluous model variance is controlled. This gives a form of regularisation where Bayesian estimation produces simple re-estimation formulae which identify a suitable bias/variance balance from the data. This approach is considered a post-processing step to model construction, the merits of which are demonstrated by the application to a real world data set.
Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, ad-hoc manner. This paper pres...
详细信息
Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, ad-hoc manner. This paper presents a methodology and interactive environment for guiding the exploration process. A prototype targeting behavioral-level optimization for datapath-intensive ASIC implementations has been developed. The key to the approach is encapsulated knowledge about the various optimizations and a set of techniques to automatically extract the "essence" of a design description. At each stage in the exploration process, the system suggests and ranks potential optimizations, both in terms of immediate and longer-term impact. It also provides evaluations of the design and of the likely affects each optimization will have on metrics like power and performance. In the new approach, the designer is responsible for making the actual optimization selections. However, using the provided guidance, designers can make decisions in a more informed manner, and therefore can explore the design solution space more effectively. The effectiveness of the approach is demonstrated on a number of designs.
In this paper, novel methods for performing condition monitoring for power station turbine shafts are presented. The objective of this work is to investigate methods for producing accurate turbine vibration fault alar...
详细信息
In this paper, novel methods for performing condition monitoring for power station turbine shafts are presented. The objective of this work is to investigate methods for producing accurate turbine vibration fault alarms during turbine shaft rundowns. Wavelet packet analysis is employed to extract spectral features from healthy vibration signals and the probability density functions of these features are estimated. Both Gaussian models, using Bayesian inferencing, and mixture models are employed. Preliminary results show that the more computationally expensive mixture models produce more accurate density estimates and hence more reliable fault alarms.
Coordinated thread scheduling is a critical factor in achieving good performance for tightly-coupled parallel jobs on workstation clusters. We are building a coordinated scheduling system that coexists with the Window...
详细信息
Object-oriented languages like Java and Smalltalk provide a uniform object model that simplifies programming by providing a consistent, abstract model of object behavior. But direct implementations introduce overhead,...
详细信息
ISBN:
(纸本)9780897919074
Object-oriented languages like Java and Smalltalk provide a uniform object model that simplifies programming by providing a consistent, abstract model of object behavior. But direct implementations introduce overhead, removal of which requires aggressive implementation techniques (e.g. type inference, function specialization);in this paper, we introduce object inlining, an optimization that automatically inline allocates objects within containers (as is done by hand in CS++) within a uniform model. We present our technique, which includes novel program analyses that track how inlinable objects are used throughout the program. We evaluated object inlining on several object-oriented benchmarks. It produces performance up to three times as fast as a dynamic model without inlining and roughly equal to that of manually-inlined codes.
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consi...
详细信息
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and hardware support, are suitable for computations involving coarse-grained threads but provide low efficiency in the presence of small granularity threads and irregular communication behavior. We describe two mechanisms of the Illinois Concert runtime system which address this shortcoming. The first, hybrid stack-heap execution, exploits close coupling with the compiler to dynamically form coarse-grained execution units;threads are lazily created as required by runtime situations. The second, pull messaging, exploits hardware support to implement a distributed message queue with receiver-initiated data transfer, delivering robust performance across a wide range of dynamic communication characteristics. We measure their performance impact based on a Gray T3D implementation of the Concert system. Individually, the mechanisms increase absolute execution efficiency by up to 50%. Together, they increase the feasible space of efficient computations, enabling compute granularities an order of magnitude smaller. Performance results for two large irregular applications demonstrate that expressing programs using dynamic multithreading need not compromise on performance. (C) Academic Press, Inc.
This paper presents a methodology for performance prediction of parallel algorithms and illustrates its use on a large scale computational chemistry application. The performance prediction uses a component time charac...
详细信息
This paper presents a methodology for performance prediction of parallel algorithms and illustrates its use on a large scale computational chemistry application. The performance prediction uses a component time characterization technique which splits up the sequential code into computational components and measures the time for each of them. The parallel algorithm is built from these components by adding communication routines. A “Processor Activity Graph” (PAG) providing a graphical representation of the parallel algorithm runtime behaviour is used for predicting the execution time. For a case study a Self Consistent Field (SCF) computation has been selected which forms the basis of many computational chemistry packages [4, 5]. The performance model of SCF computation has been built and the prediction have been compared with the results of measurements. The measurements have been provided on a mesh connected distributed memory parallelcomputer (128 T800 Parsytec SuperCluster). The prediction error is less than 10%. Performance optimisation of the application has been achieved by reducing the communication overhead and changing the data representation.
暂无评论