It is seen that Weather Forecast Models (WFMs) are often implemented using the sequential programs. This usually takes longer execution time, larger computer resources and more power as WFMs involve high level computa...
详细信息
ISBN:
(纸本)9781467329255;9781467329224
It is seen that Weather Forecast Models (WFMs) are often implemented using the sequential programs. This usually takes longer execution time, larger computer resources and more power as WFMs involve high level computational tasks to process large amount of weather forecast data. These become problems for the weather forecast companies in terms of WFM performance. The companies have already tried to use the multi-core systems to overcome these, but it does not work always because of the poor selection and implementation of programming strategies. By addressing these problems, a research project has been conducted as a case study for the weather production company named Weather2 Ltd. The case study attempted multi-threaded programming based on the multi-core systems as a different implementation strategy for Weather2's WFM as solution to their problems in using sequential programs. The results of the case study showed that this new strategy could improve the performance of WFM significantly by reducing the execution time, using less computer resources and power. This paper presents the case study and its results.
The paper presents investigations on the performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical x...
详细信息
The paper presents investigations on the performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical x86_64 CPU, Intel Xeon Phi and NVIDIA Kepler GPU. We base the discussion on theoretical performance models and our own implementations for which we perform a range of computational experiments. For the latter, we consider a unifying programming model and portable OpenCL implementation for all architectures. Variations of the algorithm due to different problems solved and different element types are investigated and several optimizations aimed at proper optimization and mapping of the algorithm to computer architectures are demonstrated. The experimental results show the varying levels of performance for different architectures, but indicate that the algorithm can be effectively ported to all of them. The conclusions indicate the factors that limit the performance for different problems and types of approximation and the performance ranges that can be expected for FEM numerical integration on different processor architectures. (C) 2016 Elsevier B.V. All rights reserved.
This article presents a novel CPU-based parallel algorithm (P-SURF) that computes the Speeded-Up Robust Features (SURF), a local descriptor that is able to find point correspondences between images in spite of scaling...
详细信息
This article presents a novel CPU-based parallel algorithm (P-SURF) that computes the Speeded-Up Robust Features (SURF), a local descriptor that is able to find point correspondences between images in spite of scaling and rotation. The algorithm presented here parallelises all the seven major steps found in the original serial computation. The task in each of the steps is decomposed and the fractions are assigned to running threads bound onto distinctive processors. The implementation of the algorithm was tested using randomly selected images in regard to performance, scalability and stability. The results showed that its performance on mid-level Intel Core Duo processors was comparable to that of some fast GPU-based SURF implementations. For example, on a testing system equipped with an Intel Core Duo P8600 at 2.4 GHz, P-SURF was able to extract and represent features from a 640 x 480 image at a rate of 33 frames per second. The experimental results also revealed that, instead of leaving the threads to the kernel for processor assignment, assigning hard processor affinity by the algorithm produced better performance and stability.
In this article we present a technique for reducing the memory overhead while performing data race detection. Data races occur when multiple threads modify the same memory location without proper synchronization. In o...
详细信息
In this article we present a technique for reducing the memory overhead while performing data race detection. Data races occur when multiple threads modify the same memory location without proper synchronization. In order to detect data races, we need to check all read and write operations performed by the threads. We describe a method for efficiently storing these read and write operations called "merging of segment histories". This method improves upon known techniques by ensuring an upper limit to the amount of memory consumed for storing the read and write operations while maintaining the full accuracy of the data race detection. The method has been implemented in an existing data race detection tool called RecPlay for Solaris binaries. We show that it enables us to perform data race detection on benchmarks which were previously beyond our grasp. (C) 2002 Elsevier Science B.V. All rights reserved.
CSP# is a formal modeling language that emphasizes the design of communication in concurrent systems. PAT framework provides a model checking environment for the simulation and verification of CSP# models. Although th...
详细信息
CSP# is a formal modeling language that emphasizes the design of communication in concurrent systems. PAT framework provides a model checking environment for the simulation and verification of CSP# models. Although the desired properties can be formally verified at the design level, it is not always straightforward to ensure the correctness of the system's implementation conforms to the behaviors of the formal design model. To avoid human error and enhance productivity, it would be beneficial to have a tool support to automatically generate the executable programs from their corresponding formal models. In this paper, we propose such a solution for translating verified CSP# models into C# programs in the PAT framework. We encoded the CSP# operators in a C# library-"***", where the event synchronization is based on the "Monitor" class in C#. The precondition and choice layers are built on top of the CSP event synchronization to support language-specific features. We further developed a code generation tool to automatically transform CSP# models into multi-threaded C# programs. We proved that the generated C# program and original CSP# model are equivalent on the trace semantics. This equivalence guarantees that the verified properties of the CSP# models are preserved in the generated C# programs. Furthermore, based on the existing implementation of choice operator, we improved the synchronization mechanism by pruning the unnecessary communications among the choice operators. The experiment results showed that the improved mechanism notably outperforms the standard JCSP library.
作者:
Holzmann, Gerard J.NASA
JPL Lab Reliable Software 4800 Oak Grove Dr Pasadena CA 91109 USA
The broad availability of multi-core chips on standard desktop PCs provides strong motivation for the development of new algorithms for logic model checkers that can take advantage of the additional processing power. ...
详细信息
The broad availability of multi-core chips on standard desktop PCs provides strong motivation for the development of new algorithms for logic model checkers that can take advantage of the additional processing power. With a steady increase in the number of available processing cores, we would like the performance of a model checker to increase as well - ideally linearly. The new trend implies a change of focus away from cluster computers towards shared memory systems. In this paper we discuss the multi-core algorithms that are in development for the SPIN model checker.
Due to increasing demands in processing power on the one hand, but the physical limit on CPU clock speed on the other hand, multi-threaded programming is becoming more important in current applications. Unfortunately,...
详细信息
Due to increasing demands in processing power on the one hand, but the physical limit on CPU clock speed on the other hand, multi-threaded programming is becoming more important in current applications. Unfortunately, multi-threaded programs are prone to programming mistakes that result in hard to find defects, mainly race-conditions and deadlocks. The need for tools that help finding these faults is immanent, but currently available tools are either difficult to use because of the need for annotations, unable to cope with more than a few 10 kLOC, or issue too many false warnings. This paper describes experiments with the freely available tool Helgrind and results obtained by using it for debugging a server application comprising 500 kLOC. We present improvements to the runtime analysis of C++ programs that result in a dramatic reduction of false warnings.
We propose a method for parallel multi-view graph matrix completion for the prediction of ratings in recommender systems. The missing ratings are computed based on both the similarity matrix in addition to a rating ma...
详细信息
ISBN:
(纸本)9781728105543
We propose a method for parallel multi-view graph matrix completion for the prediction of ratings in recommender systems. The missing ratings are computed based on both the similarity matrix in addition to a rating matrix. The rating matrix is sparse and some items might not have any rating information available. The similarity matrix can be calculated from different item attributes available from ecommerce websites. As the input matrix becomes large, the need for more computationally efficient matrix completion increases. The main contribution of this paper is to show speed-up in calculating the missing ratings by using multi-threaded programming. Simulation results are based on the large input matrix and show reduction in RMSE for the case of cold start prediction.
Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a mod...
详细信息
ISBN:
(纸本)9783662480960;9783662480953
Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.
Finding synchronization defects is difficult due to non-deterministic orderings of parallel threads. Current tools for detecting synchronization defects tend to miss man), data races or produce an overwhelming number ...
详细信息
ISBN:
(纸本)9781424437511
Finding synchronization defects is difficult due to non-deterministic orderings of parallel threads. Current tools for detecting synchronization defects tend to miss man), data races or produce an overwhelming number of false alarms. In this paper, we describe Helgrind(+), a dynamic race detection tool that incorporates correct handling of condition variables and a combination of the lockset algorithm and happens-before relation. We compare our techniques with Intel Thread Checker and the original Helgrind tool on two substantial benchmark suites. Helgrind+ reduces the number of both false negatives (missed races) and false positives. The additional accuracy incurs almost no performance overhead.
暂无评论