the main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). the DMM and...
详细信息
In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such as breadth-first search (BFS), are largely dominated by mem...
详细信息
ISBN:
(纸本)9780769547688
In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such as breadth-first search (BFS), are largely dominated by memory latency and are challenging to process efficiently. In this paper, we present a reconfigurable hardware methodology for efficient parallelprocessing of large-scale graph exploration problems. Our methodology is based on a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the hardware capabilities of both FPGAs and the parallel memory subsystem. To validate our methodology, we provide a detailed design description of the Breadth-First Search algorithm on an FPGA-based high performance computing system. Using graph data based on the power-law graphs found in real-word problems, we are able to achieve performance results that are superior to those of high performance multi-core systems in the recent literature for large graph instances, and a throughput in excess of 2.5 billion traversed edges per second on RMAT graphs with 16 million vertices and over a billion edges. Using four Virtex-5 LX330 FPGAs based on 65nm technology and running at 75MHz, our BFS design achieves more than twice the speed of a 32-core Xeon X7560 based on 45nm technology and running at 2.26GHz.
In order to improve the processing speed of highly complex video coding that involves high data rate and large amount of computation, a parallel video coding system optimization (PVCSO) method is proposed in this pape...
详细信息
In this paper, we study two hierarchical N-Body methods for Network-on-Chip (NoC) architectures. the modern Chip Multiprocessor (CMP) designs are mainly based on the shared-bus communication architecture. As the numbe...
详细信息
ISBN:
(数字)9783642297403
ISBN:
(纸本)9783642297403;9783642297397
In this paper, we study two hierarchical N-Body methods for Network-on-Chip (NoC) architectures. the modern Chip Multiprocessor (CMP) designs are mainly based on the shared-bus communication architecture. As the number of cores increases, it suffers from high communication delays. therefore, NoC based architecture is proposed. the N-Body problem is a classical problem of approximating the motion of bodies. Two methods, namely Barnes-Hut (Barnes) and Fast Multipole (FMM), have been developed for fast simulation. the two algorithms have been implemented and studied in conventional computer systems and Graphics processing Units (GPUs). However, as a promising unconventional multicore architecture, the evaluation of N-Body methods in a NoC platform has not been well addressed. We define a NoC model based on state-of-the-art systems. Evaluation results are presented using a cycle accurate full system simulator. Experiments show that, Barnes scales better (53.7x/Barnes and 36.6x/FMM for 64 processing elements) and requires less cache than FMM. However, we observe hot-spot traffic in Barnes. Our analysis and experiment results provide a guideline for studying N-Body methods in a NoC platform.
A radix-2 16 bits CORDIC (CoOrdinate Rotation DIgital Computer) architecture which includes pipelined and parallelism is presented in this paper. A full custom technology for CORDIC datapath which is used in the propo...
详细信息
ISBN:
(纸本)9781467317443
A radix-2 16 bits CORDIC (CoOrdinate Rotation DIgital Computer) architecture which includes pipelined and parallelism is presented in this paper. A full custom technology for CORDIC datapath which is used in the proposed architecture for 16-bit precision can improve the throughout and decrease the area. As a result, the silicon area of the data-path is 11699.877 mu m(2) in the 45nm CMOS technology library and the critical path delay is 875ps at the SS (Slow-Slow) corners whose Voltage and Temperature are 1.1V and 75 degrees C respectively. Based on the layout level, the simulation results show that the design has characteristics of high speed and small area in full custom technology.
Withthe ever increasing size of data sets, traditional parallel relational database solution can be prohibitively expensive and may suffer limited scalability. To perform large-scale data processing in a cost-effecti...
详细信息
We show that developing an optimal parallelization of the two-list algorithm is much easier than we once thought. All it takes is to observe that the steps of the search phase of the two-list algorithm are closely rel...
详细信息
Huge workload and time-consuming of the phase computation based on the Wavelet Transform Profilometry (WTP) so that not meet real-time three-dimensional (3D) measurement needs. Fortunately the pixels which in situ nee...
详细信息
In this paper, we propose a parallel algorithm to solve a class of nonlinear network optimization problems. the proposed parallel algorithm is a combination of the successive quadratic programming and the dual method,...
详细信息
the proceedings contain 8 papers. the topics discussed include: blink: not your father's database!;MemcacheSQL - a scale-out SQL cache engine;a cost-aware strategy for merging differential stores in column-oriente...
ISBN:
(纸本)9783642334993
the proceedings contain 8 papers. the topics discussed include: blink: not your father's database!;MemcacheSQL - a scale-out SQL cache engine;a cost-aware strategy for merging differential stores in column-oriented in-memory DBMS;Microsoft SQL server parallel data warehouse: architecture overview;relax and let the database do the partitioning online;adaptive processing of multi-criteria decision support queries;scalable social graph analytics using the vertica analytic platform;and a near real-time personalization for ecommerce platform.
暂无评论