This paper presents a number of novel metaheuristic approaches that can efficiently map stream graphs on multicores. A stream graph consists of a set of actors performing different functions communicating through edge...
详细信息
This paper presents a number of novel metaheuristic approaches that can efficiently map stream graphs on multicores. A stream graph consists of a set of actors performing different functions communicating through edges. Orchestrating stream graphs on multi cores can be formulated as an Integer Linear programming (ILP) problem but ILP solver takes exponential time to provide an optimal solution. We propose metaheuristic algorithms to achieve near optimal solutions within a reasonable amount of time. We employ six different variants of the Hill-Climbing (HC) algorithm employing different tweak operators that produce excellent result extremely quickly. We also propose six different variants of Genetic Algorithm (GA) to examine how effective these variants can be in escaping the local optima. We finally combine HC and GA techniques (which is also known as 'memetic algorithm') to produce hybrid techniques that outperform the individual performance of HC and GA techniques. We compare our results with the results generated by the CPLEX optimization tool. Our best technique has achieved a geometric mean speedup of 7.42 x across a range of StreamIt benchmarks on an eight-core processor. (C) 2016 Elsevier Ltd. All rights reserved.
Protein secondary structure describe protein construction in terms of regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. This inf...
详细信息
Protein secondary structure describe protein construction in terms of regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. This information is supportive for protein classification, functional annotation, and 3D structure prediction. The relevance of this information and the scope of its practical applications cause the requirement for its effective storage and processing. Relational databases, widely-used in commercial systems in recent years, are one of the serious alternatives honed by years of experience, enriched with developed technologies, equipped with the declarative SQL query language, and accepted by the large community of programmers. Unfortunately, relational database management systems are not designed for efficient storage and processing of biological data, such as protein secondary structures. In this paper, we present a new search method implemented in the search engine of the PSS-SQL language. The PSS-SQL allows formulation of queries against a relational database in order to find proteins having secondary structures similar to the structural pattern specified by a user. In the paper, we will show how the search process can be accelerated by multiple scanning of the Segment Index and parallel implementation of the alignment procedure using multiple threads working on multiple-core CPUs.
Over the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, part...
详细信息
Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterog...
详细信息
ISBN:
(纸本)9781450342797
Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterogeneous parallel architectures. We present Ebb, a Domain-Specific Language (DSL) for simulation, that runs efficiently on both CPUs and GPUs. Unlike previous DSLs, Ebb uses a three-layer architecture to separate (1) simulation code, (2) definition of data structures for geometric domains, and (3) runtimes supporting parallel architectures. Different geometric domains are implemented as libraries that use a common, unified, relational data model. By structuring the simulation framework in this way, programmers implementing simulations can focus on the physics and algorithms for each simulation without worrying about their implementation on parallel computers. Because the geometric domain libraries are all implemented using a common runtime based on relations, new geometric domains can be added as needed, without specifying the details of memory management, mapping to different parallel architectures, or having to expand the runtime's interface. We evaluate Ebb by comparing it to several widely used simulations, demonstrating comparable performance to handwritten GPU code where available, and surpassing existing CPU performance optimizations by up to 9x when no GPU code exists.
Numerical linear algebra is one of the most important forms of scientific computation. The basic computations in numerical linear algebra are matrix computations and linear systems solution. These computations are use...
详细信息
Numerical linear algebra is one of the most important forms of scientific computation. The basic computations in numerical linear algebra are matrix computations and linear systems solution. These computations are used as kernels in many computational problems. This study demonstrates the parallelisation of these scientific computations using multi core programming frameworks. Specifically, the frameworks examined here are Pthreads, OpenMP, Intel Cilk Plus, Intel TBB, SWARM, and FastFlow. A unified and exploratory performance evaluation and a qualitative study of these frameworks are also presented for parallel scientific computations with several parameters. The OpenMP and SWARM models produce good results running in parallel with compiler optimisation when implementing matrix operations at large and medium scales, whereas the remaining models do not perform as well for some matrix operations. The qualitative results show that the OpenMP, Cilk Plus, TBB, and SWARM frameworks require minimal programming effort, whereas the other models require advanced programming skills and experience. Finally, based on an extended study, general conclusions regarding the programming models and matrix operations for some parameters were obtained. (C) 2014 IMACS. Published by Elsevier B:V. All rights reserved.
This paper describes the performance of the Brain Project, a distributed software tool for the formal modeling of numerical data using a hybrid neural-genetic programming technique. One of the most interesting charact...
详细信息
This paper describes the performance of the Brain Project, a distributed software tool for the formal modeling of numerical data using a hybrid neural-genetic programming technique. One of the most interesting characteristics of the Brain Project is its distributed implementation. Unlike many other parallel and/or distributed solutions the only requirement of the Brain Project is that the collaborating personal computers must be 64-bit Linux machines connected to Internet via the transmission control protocol/internet protocol. The performance of the Brain Project is clearly enhanced with the very simple parallelization scheme illustrated in the paper. Although the Brain Project presents many innovative solutions for the genetic programming research, this paper focuses mainly on its behavior in the distributed environment. (C) 2015 Elsevier B.V. All rights reserved.
Biomedical systems have been using ontology matching as a primary technique for heterogeneity resolution. However, the natural intricacy and vastness of biomedical data have compelled biomedical ontologies to become l...
详细信息
Biomedical systems have been using ontology matching as a primary technique for heterogeneity resolution. However, the natural intricacy and vastness of biomedical data have compelled biomedical ontologies to become large-scale and complex;consequently, biomedical ontology matching has become a computationally intensive task. Our parallel heterogeneity resolution system, i.e., SPHeRe, is built to cater the performance needs of ontology matching by exploiting the parallelism-enabled multicore nature of today's desktop PC and cloud infrastructure. In this paper, we present the execution and evaluation results of SPHeRe over large-scale biomedical ontologies. We evaluate our system by integrating it with the interoperability engine of a clinical decision support system (CDSS), which generates matching requests for large-scale NCI, FMA, and SNOMED-CT biomedical ontologies. Results demonstrate that our methodology provides an impressive performance speedup of 4.8 and 9.5times over a quad-core desktop PC and a four virtual machine (VM) cloud platform, respectively.
Multicore hardware and software are becoming increasingly more complex. The programmability problem of multicore software has led to the use of parallel patterns. parallel patterns reduce the effort and time required ...
详细信息
Multicore hardware and software are becoming increasingly more complex. The programmability problem of multicore software has led to the use of parallel patterns. parallel patterns reduce the effort and time required to develop multicore software by effectively capturing its thread communication and data sharing characteristics. Hence, detecting the parallel pattern used in a multi-threaded application is crucial for performance improvements and enables many architectural optimizations;however, this topic has not been widely studied. We apply machine learning techniques in a novel approach to automatically detect parallel patterns and compare these techniques in terms of accuracy and speed. We experimentally validate the detection ability of our techniques on benchmarks including PARSEC and Rodinia. Our experiments show that the k-nearest neighbor, decision trees, and naive Bayes classifier are the most accurate techniques. Overall, decision trees are the fastest technique with the lowest characterization overhead producing the best combination of detection results. We also show the usefulness of the proposed techniques on synthetic benchmark generation.
A notorious class of concurrency bugs are race condition related to correlated variables, which make up about 30 % of all non-deadlock concurrency bugs. A solution to prevent this problem is the automatic generation o...
详细信息
A notorious class of concurrency bugs are race condition related to correlated variables, which make up about 30 % of all non-deadlock concurrency bugs. A solution to prevent this problem is the automatic generation of parallel unit tests. This paper presents an approach to generate parallel unit tests for variable correlations in multithreaded code. We introduce a hybrid approach for identifying correlated variables. Furthermore, we estimate the number of potentially violated correlations for methods executed in parallel. In this way, we are capable of creating unit tests that are suited for race detectors considering correlated variables. We were able to identify more than 85 % of all race conditions on correlated variables in eight applications after applying our parallel unit tests. At the same time, we reduced the number of unnecessary generated unit tests. In comparison to a test generator unaware of variable correlations, redundant unit tests are reduced by up to 50 %, while maintaining the same precision and accuracy in terms of the number of detected races.
Genetic programming (GP) (Koza, Genetic programming, MIT Press, Cambridge, 1992) is well-known as a computationally intensive technique. Subsequently, faster parallel versions have been implemented that harness the hi...
详细信息
Genetic programming (GP) (Koza, Genetic programming, MIT Press, Cambridge, 1992) is well-known as a computationally intensive technique. Subsequently, faster parallel versions have been implemented that harness the highly parallel hardware provided by graphics cards enabling significant gains in the performance of GP to be achieved. However, extracting the maximum performance from a graphics card for the purposes of GP is difficult. A key reason for this is that in addition to the processor resources, the fast on-chip memory of graphics cards needs to be fully exploited. Techniques will be presented that will improve the performance of a graphics card implementation of tree-based GP by better exploiting this faster memory. It will be demonstrated that both L1 cache and shared memory need to be considered for extracting the maximum performance. Better GP program representation and use of the register file is also explored to further boost performance. Using an NVidia Kepler 670GTX GPU, a maximum performance of 36 billion Genetic programming Operations per Second is demonstrated.
暂无评论