Relational database management systems (RDBMS) are still widely required by numerous business applications. Boosting performances without compromising functionalities represents a big challenge. To achieve this goal, ...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
Relational database management systems (RDBMS) are still widely required by numerous business applications. Boosting performances without compromising functionalities represents a big challenge. To achieve this goal, we propose to boost an existing RDBMS by making it able to use hardware architectures with high memory bandwidth like GPUs. In this paper we present a solution named CuDB. We compare the performances and energy efficiency of our approach with different GPU ranges. We focus on technical specificities of GPUs which are most relevant for designing high energy efficient solutions for database processing.
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an opt...
ISBN:
(纸本)9781509038671
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an optimized multicolor point-implicit solver for unstructured grid applications on graphics processing units;optimizing sparse tensor times matrix on multi-core and many-core architectures;compiler transformation to generate hybrid sparse computations;an OpenCL framework for distributed apps on a multidimensional network of FPGAs;fast parallel cosine K-nearest neighbor graph construction;performance evaluation of parallel sparse tensor decomposition implementations;implementation and evaluation of data-compression algorithms for irregular-grid iterative methods on the PEZY-SC processor;dynamic load balancing for high-performance graph processing on hybrid CPU-GPU platforms;a fast level-set segmentation algorithm for image processing designed for parallelarchitectures;HISC/R: an efficient hypersparse-matrix storage format for scalable graph processing;optimized distributed work-stealing;and fine-grained parallelism in probabilistic parsing with Habanero Java.
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, ...
详细信息
ISBN:
(纸本)9781467387767
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, that is written in a certain format. It is a distributed dynamic scheduling algorithm, i.e. each node schedules the load of its cores, it can be used with different programming platforms and architectures and is a hybrid algorithm (OpenMP, MPI). To prove its validity and efficiency the proposed algorithm has been implemented and tested with numerous examples in this paper that are described in detail. A speed-up of about 9 has been achieved for the tested examples, for a cluster of three nodes with four cores each.
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we can compare the parallel programming languages and technologies. the Globalizer Lab system is capable of estimating the efficiency of algorithms for solving computationally intensive global optimization problems. these systems can build models of various high-performance systems, formulate the problems to be solved, perform computational experiments in the simulation mode and analyze the results. the crucial matter is that the described systems support a visual representation of the parallel computation process. If combined, these systems can be useful for developing high-performance parallel programs which take the specific features of modern supercomputing systems into account.
In this paper, General Purpose Graphical processing Unit (GPGPU) based concurrent implementation of handwritten digit classifier is presented. Different styles of handwriting make it difficult to recognize a pattern b...
详细信息
ISBN:
(纸本)9781509055869
In this paper, General Purpose Graphical processing Unit (GPGPU) based concurrent implementation of handwritten digit classifier is presented. Different styles of handwriting make it difficult to recognize a pattern but using neural network, it is not a difficult task to perform. Different softwares like torch and MATLAB provide the support of multiple training algorithms to train a network. By choosing an appropriate training algorithm for a specific application, speed of training can be increased. Furthermore, using computational power of GPUs, training and classification speed of neural network can be significantly improved. In this work, Modified National Institute of Standards and Technology (MNIST) database of handwritten digits is used to train the network. Accuracy and training time of digit classifier is evaluated for different algorithms and then concurrent training is performed by exploiting power of GPU. Trained parameters are imported and used for the concurrent classification with Compute Unified Device Architecture (CUDA) computing language which can be useful in numerous practical applications. Finally, the results of sequential and concurrent operations of training and classification are compared.
the proceedings contain 13 papers. the special focus in this conference is on Mathematical and Engineering Methods in Computer Science. the topics include: Programming support for future parallelarchitectures;flexibl...
ISBN:
(纸本)9783319298160
the proceedings contain 13 papers. the special focus in this conference is on Mathematical and Engineering Methods in Computer Science. the topics include: Programming support for future parallelarchitectures;flexible interpolation for efficient model checking;understanding transparent and complicated users as instances of preference learning for recommender systems;span-program-based quantum algorithms for graph bipartiteness and connectivity;fitting aggregation operators;practical exhaustive generation of small multiway cuts in sparse graphs;self-adaptive architecture for multi-sensor embedded vision system;exceptional configurations of quantum walks with grover’s coin;performance analysis of distributed stream processing applications through colored petri nets;GPU-accelerated real-time mesh simplification using parallel half edge collapses;classifier ensemble by semi-supervised learning;the challenge of increasing safe response of antivirus software users and weak memory models as LLVM-to-LLVM transformations.
the k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently se...
详细信息
ISBN:
(纸本)9781509028238
the k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds;in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, It is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.
this article describes an approach to scalability analysis of parallel applications, which is a major part of the algorithm description used in AlgoWiki, the Open Encyclopedia of parallel Algorithmic Features. the pro...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
this article describes an approach to scalability analysis of parallel applications, which is a major part of the algorithm description used in AlgoWiki, the Open Encyclopedia of parallel Algorithmic Features. the proposed approach is based on the suggested definition of generalized scalability of a parallel application. this study uses joined and structured data on an application's execution and supercomputing co-design technologies. parallel application properties are studied by analyzing data collected from all available sources of its dynamic characteristics and information about the hardware and software platforms corresponding withthe features of an algorithm and its implementation. this allows reasonable conclusion to be drawn regarding potential reasons of changes in the execution quality for any parallel applications and to compare the scalability of various programs.
the A* algorithm is generally used to solve combinatorial optimization problems, but it requires high computing power and a large amount of memory, hence, efficient parallel A* algorithms are needed. In this sense, Ha...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
the A* algorithm is generally used to solve combinatorial optimization problems, but it requires high computing power and a large amount of memory, hence, efficient parallel A* algorithms are needed. In this sense, Hash Distributed A* (HDA*) parallelizes A* by applying a decentralized strategy and a hash-based node distribution scheme. However, this distribution scheme results in frequent node transfers among processors. In this paper, we present Optimized AHDA*, a version of HDA* for shared memory architectures, that uses an abstraction-based node distribution scheme and a technique to group several nodes before transferring them to the corresponding thread. Both methods reduce the amount of node transfers and mitigate communication and contention. We assess the effect of each technique on algorithm performance. Finally, we evaluate the scalability of the proposed algorithm, when it is run on a multicore machine, using the 15-puzzle as a case study.
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space explorat...
详细信息
ISBN:
(纸本)9781467387767
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space exploration. In the first part of design process, high-level simple analytical models for application mapping and evaluation are used and modified aiming at accelerating the evaluation of a single design configuration. Using the analytical model the design space is pruned and explored at high speed with low accuracy. In the second part of the design process, two Multi Objective Optimization algorithms based on Particle Swarm Optimization and Simulated Annealing have been proposed to perform design space exploration of the pruned design space with higher accuracy taking advantages of low-level architectural simulation engines. the results obtained by proposed algorithms will provide the designer more accurate solutions within an acceptable time. Considering the MJPEG application as the case study, each of these methods produces a set of near-optimal points. Simulation results show that the proposed methods can lead to near-optimal design configurations with acceptable accuracy in reasonable time.
暂无评论