We describe a novel, systematic approach to efficiently parallelizing datamining algorithms: starting with the representation of an algorithm as a sequential composition of functions, we formally transform it into a p...
详细信息
We describe a novel, systematic approach to efficiently parallelizing datamining algorithms: starting with the representation of an algorithm as a sequential composition of functions, we formally transform it into a parallel form using higher-order functions for specifying parallelism. We implement the approach as an extension of the industrial-strength Java-based library Xelopes, and we illustrate its use by developing a multi-threaded Java program for the popular naive Bayes classification algorithm. In comparison with the popular MapReduce programming model, our resulting programs enable not only data-parallel, but also task-parallel implementation and a combination of both. Our experiments demonstrate an efficient parallelization and good scalability on multi-core processors.
Game semantics is a denotational semantics presenting compositionally the computational behaviour of various kinds of effectful programs. One of its celebrated achievement is to have obtained full abstraction results ...
详细信息
The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the **...
详细信息
The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the *** the same time,we also did some research and experiments on the evaluation function of Amazon chess which provided a set of available Amazon evaluation functions and parameter adjustment results for reference.
Program verification is to develop the program’s proof system, and to prove the proof system soundness with respect to a trusted operational semantics of the program. However, many practical program verifiers are not...
详细信息
Cash flow prediction of a bank is an important task as it is not only related to liquidity risk but is also regulated by financial authorities. To improve the prediction, a graph analysis of bank transaction data is p...
详细信息
ISBN:
(纸本)9781665445993
Cash flow prediction of a bank is an important task as it is not only related to liquidity risk but is also regulated by financial authorities. To improve the prediction, a graph analysis of bank transaction data is promising, while its size, scale-free nature, and various attributes make the task *** this paper, we propose a graph-based machine learning method for the cash flow prediction t ask. Our contributions are as follows. (i) We introduce an extensible and scalable shared-memory parallel graph analysis platform that supports the vertex-centric, bulk synchronous parallel programming paradigm. (ii) We introduce two novel graph features upon the platform: (ii-a) an internal money flow feature based on the Markov process approximation, and (ii-b) an anomaly score feature derived from other graph *** proposed method is examined with real bank transaction data. The proposed graph features reduce the error of a long-term (31-day) cash flow prediction by 56 % from that of a non-graph-based time-series prediction model. The graph analysis platform can compute graph features from a graph with 10 × 10 6 nodes and 593 × 10 6 edges in 2 hours 20 minutes.
With the spread of multi-core systems, parallel programming increased in popularity. However, parallelizing algorithms in some cases yield negative results due to overhead. Additionally, implementing parallel algorith...
详细信息
ISBN:
(纸本)9781728167206
With the spread of multi-core systems, parallel programming increased in popularity. However, parallelizing algorithms in some cases yield negative results due to overhead. Additionally, implementing parallel algorithms is not always an easy or achievable task. Therefore, finding out to what extent a multi-core architecture can aid in the enhancement of the algorithm's speedup could become extremely beneficial. This paper studies and calculates the execution time and speedup of three of the most popular divide and conquer algorithms (Merge sort, quick sort, and matrix multiplication), the conducted experiments tested against various array sizes. The experiments take place on three different multi-core machines ranging from a dual-core CPU to a hexa-core CPU. The obtained results conclude that speedup is directly proportional to the number of CPU cores, such that using a hexa-core CPU in lieu of a dual-core CPU can achieve a speedup up to twice as fast. Thus, utilizing powerful multi-core CPU's could rival the use of parallelism on a standard CPU.
Many task models have been proposed to express and analyze the behavior of real-time applications at different levels of precision. Most of them target sequential applications with no support for parallelism. The digr...
详细信息
Many task models have been proposed to express and analyze the behavior of real-time applications at different levels of precision. Most of them target sequential applications with no support for parallelism. The digraph task model is one of the most general ones, as it allows modeling arbitrary directed graphs (digraphs) for sequential job releases. In this paper, we extend the digraph task model to support intra-task parallelism. For the proposed parallel multi-mode digraph model, we derive sufficient schedulability tests and a dichotomic search to improve the test pessimism for a set of n tasks onto a heterogeneous single-ISA multi-core platform. To reduce the computational complexity of the schedulability test, we also propose heuristics for (i) partitioning parallel digraph tasks onto the heterogeneous cores, and (ii) assigning core operating frequencies to reduce the overall energy consumption, while meeting real-time constraints. The effectiveness of the proposed approach is validated with an exhaustive set of simulations.
Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with ...
详细信息
ISBN:
(纸本)9781665408790
Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with huge amounts of data, which introduces failures that may remain undetected. Therefore, additional protection becomes a necessity at large scale. However, checking the integrity of larger data sets, especially in case of distributed data, clearly requires parallel approaches. We show how popular checksums, such as CRC-32 or Adler-32, can be parallelized efficiently. This also disproves a widespread belief that parallelizing aforementioned checksums, especially in a scalable way, is not possible. The mathematical properties behind these checksums enable a method to combine partial checksums such that its result corresponds to the checksum of the concatenated partial data. Our parallel checksum algorithm utilizes this combination idea in a scalable hierarchical reduction scheme to combine the partial checksums from an arbitrary number of processing elements. Although this reduction scheme can be implemented manually using most parallel programming interfaces, we use the Message Passing Interface, which supports such a functionality directly via non-commutative user-defined reduction operations. In conjunction with the efficient checksum capabilities of the zlib library, our algorithm can not only be implemented conveniently and in a portable way, but also very efficiently. Additional shared-memory parallelization within compute nodes completes our hybrid parallel checksum solutions, which show a high scalability of up to 524,288 threads. At this scale, computing the checksums of 240 TiB data took only 3.4 seconds for CRC-32 and 2.6 seconds for Adler-32. Finally, we discuss the APES application as a representative of dynamic supercomputer applications. Thanks to our scalable checksum algorithm, even such applications are now able to detect many errors withi
This paper introduces the principle of the three classical and widely applied local value methods, including Otsu method, maximum entropy method and iterative method. It runs on VS2010 (Microsoft Visual Studio 2010) p...
详细信息
ISBN:
(纸本)9781450388368
This paper introduces the principle of the three classical and widely applied local value methods, including Otsu method, maximum entropy method and iterative method. It runs on VS2010 (Microsoft Visual Studio 2010) platform, compares and analyzes it. And then, selects Otsu method with relatively good results to transplant in standard C language on CCS (Code Composer Studio) platform. A multi-core DSP (Digital Signal Processor) is established. After the TMS320C6678 environment, the OpenMP framework is used for parallel processing to optimize the Otsu method for fork-Join mode is used for parallel computing. Two cores, four cores and eight cores are used for fast processing of the Otsu method, summarize the law of speed increase. The results show that the parallel implementation of the digital image processing algorithm based on multi-core DSP in this paper can effectively improve the running speed on the basis of ensuring the accuracy of the Otsu method.
An important challenge in parallel computing is the mapping of parallel algorithms to parallel computing platforms. This requires several activities such as the analysis of the parallel algorithm, the definition of th...
详细信息
An important challenge in parallel computing is the mapping of parallel algorithms to parallel computing platforms. This requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform and the implementation and deployment of the algorithm to the computing platform. However, in current parallel computing approaches very often only conceptual and idiosyncratic models are used which fall short in supporting the communication and analysis of the design decisions. In this article, we present ParDSL, a domain-specific language framework for providing explicit models to support the activities for mapping parallel algorithms to parallel computing platforms. The language framework includes four coherent set of domain-specific languages each of which focuses on an activity of the mapping process. We use the domain-specific languages for modeling the design as well as for generating the required platform-specific models and the code of the selected parallel algorithm. In addition to the languages, a library is defined to support systematic reuse. We discuss the overall architecture of the language framework, the separate DSLs, the corresponding model transformations and the toolset. The framework is illustrated for four different parallel computing algorithms.
暂无评论