Relational database management systems (RDBMS) are still widely required by numerous business applications. Boosting performances without compromising functionalities represents a big challenge. To achieve this goal, ...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
Relational database management systems (RDBMS) are still widely required by numerous business applications. Boosting performances without compromising functionalities represents a big challenge. To achieve this goal, we propose to boost an existing RDBMS by making it able to use hardware architectures with high memory bandwidth like GPUs. In this paper we present a solution named CuDB. We compare the performances and energy efficiency of our approach with different GPU ranges. We focus on technical specificities of GPUs which are most relevant for designing high energy efficient solutions for database processing.
Convolutional Neural Network (CNN) is the state-ofthe-art deep learning approach employed in various applications due to its remarkable performance. Convolutions in CNNs generally dominate the overall computation comp...
详细信息
ISBN:
(纸本)9781509028603
Convolutional Neural Network (CNN) is the state-ofthe-art deep learning approach employed in various applications due to its remarkable performance. Convolutions in CNNs generally dominate the overall computation complexity and thus consume major computational power in real implementations. In this paper, efficient hardware architectures incorporating parallel fast finite impulse response (FIR) algorithm (FFA) for CNN convolution implementations are discussed. the theoretical derivation of 3 and 5 parallel FFAs is presented and the corresponding 3 and 5 parallel fast convolution units (FCUs) are proposed for most commonly used 3 x 3 and 5 x 5 convolutional kernels in CNNs, respectively. Compared to conventional CNN convolution architectures, the proposed FCUs reduce the number of multiplications used in convolutions significantly. Additionally, the FCUs minimize the number of reads from the feature map memory. Furthermore, a reconfigurable FCU architecture which suits the convolutions of both 3 x 3 and 5 x 5 kernels is proposed. Based on this, an efficient top-level architecture for processing a complete convolutional layer in a CNN is developed. To quantize the benefits of the proposed FCUs, the design of an FCU is coded with RTL and synthesized with TSMC 90nrn CMOS technology. the implementation results demonstrate that 30% and 36% of the computational energy can be saved compared to conventional solutions with 3 x 3 and 5 x 5 kernels in CNN, respectively.
In order to run Computational Fluid Dynamics (CFD) codes on large scale infrastructures, parallel computing has to be used because of the computational intensive nature of the problems. In this paper we investigate th...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In order to run Computational Fluid Dynamics (CFD) codes on large scale infrastructures, parallel computing has to be used because of the computational intensive nature of the problems. In this paper we investigate the ADAPT platform where we couple flow Partial Differential Equations and a Poisson equation. this leads to a linear system which we solve using direct methods. the implementation deals withthe MUMPS parallel multi-frontal direct solver and mesh partitioning methods using METIS to improve the performance of the framework. We also investigate, in this paper, how the mesh partitioning methods are able to optimize the mesh cell distribution for the ADAPT solver. the experience gained in this paper facilitates the move to a Service Oriented view of ADAPT as future work.
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an opt...
ISBN:
(纸本)9781509038671
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an optimized multicolor point-implicit solver for unstructured grid applications on graphics processing units;optimizing sparse tensor times matrix on multi-core and many-core architectures;compiler transformation to generate hybrid sparse computations;an OpenCL framework for distributed apps on a multidimensional network of FPGAs;fast parallel cosine K-nearest neighbor graph construction;performance evaluation of parallel sparse tensor decomposition implementations;implementation and evaluation of data-compression algorithms for irregular-grid iterative methods on the PEZY-SC processor;dynamic load balancing for high-performance graph processing on hybrid CPU-GPU platforms;a fast level-set segmentation algorithm for image processing designed for parallelarchitectures;HISC/R: an efficient hypersparse-matrix storage format for scalable graph processing;optimized distributed work-stealing;and fine-grained parallelism in probabilistic parsing with Habanero Java.
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one ...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one of the major energy consumers, in which, a large amount of data is queried by complex queries running daily. Having green nodes is a pre-condition to design an energy-aware parallel database cluster. Generally, the most existing DBMSs focus on high-performance during query optimization phase, while usually ignoring the energy consumption of the queries. In this paper, we propose a methodology, supported by a tool called EnerQuery, that makes nodes of parallel database clusters saving energy when optimizing queries. To show its effectiveness, we implement our proposal on the top of PostgreSQL DBMS query optimizer. A mathematical cost model based on a machine learning technique is defined and used to estimate the energy consumption of SQL queries.
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, ...
详细信息
ISBN:
(纸本)9781467387767
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, that is written in a certain format. It is a distributed dynamic scheduling algorithm, i.e. each node schedules the load of its cores, it can be used with different programming platforms and architectures and is a hybrid algorithm (OpenMP, MPI). To prove its validity and efficiency the proposed algorithm has been implemented and tested with numerous examples in this paper that are described in detail. A speed-up of about 9 has been achieved for the tested examples, for a cluster of three nodes with four cores each.
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we can compare the parallel programming languages and technologies. the Globalizer Lab system is capable of estimating the efficiency of algorithms for solving computationally intensive global optimization problems. these systems can build models of various high-performance systems, formulate the problems to be solved, perform computational experiments in the simulation mode and analyze the results. the crucial matter is that the described systems support a visual representation of the parallel computation process. If combined, these systems can be useful for developing high-performance parallel programs which take the specific features of modern supercomputing systems into account.
Placement is considered one of the most arduous and time-consuming processes in physical implementation flows for reconfigurable architectures, while it highly affects the quality of derived application implementation...
详细信息
ISBN:
(纸本)9781467396806
Placement is considered one of the most arduous and time-consuming processes in physical implementation flows for reconfigurable architectures, while it highly affects the quality of derived application implementation as it is tightly firmed to the total wirelength and hence the maximum operating frequency. this problem becomes more acute for three-dimensional (3-D) architectures since the complexity of such architectures imposes additional challenges that have to be sufficiently addressed. throughout this paper we introduce a novel placement algorithm, targeting 3-D reconfigurable architectures, based on Ant Colony Optimization (ACO). Experimental results validate the effectiveness of our algorithm since it achieves 10% reduction in the critical path delay on average. Additionally, in contrast to relevant approaches which are executed sequentially, the proposed algorithm exhibits inherent parallelism and can take full advantage of today's multi-core architectures.
this article describes an approach to scalability analysis of parallel applications, which is a major part of the algorithm description used in AlgoWiki, the Open Encyclopedia of parallel Algorithmic Features. the pro...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
this article describes an approach to scalability analysis of parallel applications, which is a major part of the algorithm description used in AlgoWiki, the Open Encyclopedia of parallel Algorithmic Features. the proposed approach is based on the suggested definition of generalized scalability of a parallel application. this study uses joined and structured data on an application's execution and supercomputing co-design technologies. parallel application properties are studied by analyzing data collected from all available sources of its dynamic characteristics and information about the hardware and software platforms corresponding withthe features of an algorithm and its implementation. this allows reasonable conclusion to be drawn regarding potential reasons of changes in the execution quality for any parallel applications and to compare the scalability of various programs.
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space explorat...
详细信息
ISBN:
(纸本)9781467387767
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space exploration. In the first part of design process, high-level simple analytical models for application mapping and evaluation are used and modified aiming at accelerating the evaluation of a single design configuration. Using the analytical model the design space is pruned and explored at high speed with low accuracy. In the second part of the design process, two Multi Objective Optimization algorithms based on Particle Swarm Optimization and Simulated Annealing have been proposed to perform design space exploration of the pruned design space with higher accuracy taking advantages of low-level architectural simulation engines. the results obtained by proposed algorithms will provide the designer more accurate solutions within an acceptable time. Considering the MJPEG application as the case study, each of these methods produces a set of near-optimal points. Simulation results show that the proposed methods can lead to near-optimal design configurations with acceptable accuracy in reasonable time.
暂无评论