In this paper parallel solving symmetric eigenproblems, which include standard and generalized eigenvalue problems, is discussed. For standard eigenvalue problem and tridiagonal eigenvalue problem is not the key point...
详细信息
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an opt...
ISBN:
(纸本)9781509038671
the proceedings contain 14 papers. the topics discussed include: highly scalable near memory processing with migrating threads on the emu system architecture;parallel interval stabbing on the automata processor;an optimized multicolor point-implicit solver for unstructured grid applications on graphics processing units;optimizing sparse tensor times matrix on multi-core and many-core architectures;compiler transformation to generate hybrid sparse computations;an OpenCL framework for distributed apps on a multidimensional network of FPGAs;fast parallel cosine K-nearest neighbor graph construction;performance evaluation of parallel sparse tensor decomposition implementations;implementation and evaluation of data-compression algorithms for irregular-grid iterative methods on the PEZY-SC processor;dynamic load balancing for high-performance graph processing on hybrid CPU-GPU platforms;a fast level-set segmentation algorithm for image processing designed for parallelarchitectures;HISC/R: an efficient hypersparse-matrix storage format for scalable graph processing;optimized distributed work-stealing;and fine-grained parallelism in probabilistic parsing with Habanero Java.
In this paper, we propose a new topology-core reconfigurable architecture for network-on-chips (NoC). the majority of existing research on reconfigurable architectures is focused on either reconfiguring topology of No...
详细信息
ISBN:
(纸本)9781479919994
In this paper, we propose a new topology-core reconfigurable architecture for network-on-chips (NoC). the majority of existing research on reconfigurable architectures is focused on either reconfiguring topology of NoC or reconfiguring processing elements of the network. Our approach uses a hybrid algorithm to take advantages of boththese methods by using field programmable gate array (FPGA) as reconfigurable processing elements. Moreover, programmable switches among routers are applied for topology reconfigurability. the experimental results show extensive improvement over the state-of-the-art research in terms of power consumption, network performance and application execution time.
the proceedings contain 47 papers. the special focus in this conference is on parallelarchitectures, algorithms and Programming. the topics include: On a Coexisting Scheme for Multiple Flows in Multi-radio Multi-chan...
the proceedings contain 47 papers. the special focus in this conference is on parallelarchitectures, algorithms and Programming. the topics include: On a Coexisting Scheme for Multiple Flows in Multi-radio Multi-channel Wireless Mesh Networks;Non-linear K-Barrier Coverage in Mobile Sensor Network;Interrupt Responsive Spinlock Mechanism Based on MCS for Multi-core RTOS;A Novel Speedup Evaluation for Multicore Architecture Based Topology of On-Chip Memory;Improving the Performance of Collective Communication for the On-Chip Network;A Survey of Multicast Communication in Optical Network-on-Chip (ONoC);Virtual Network Embedding Based on Core and Coritivity of Graph;Non-time-Sharing Full-Duplex SWIPT Relay System with Energy Access Point;Recent Developments in Content Delivery Network: A Survey;Weighted Mean Deviation Similarity Index for Objective Omnidirectional Video Quality Assessment;Tire X-ray Image Defects Detection Based on Adaptive thresholding Method;Halftone Image Reconstruction Based on SLIC Superpixel Algorithm;Study on the Method of Extracting Diabetes History from Unstructured Chinese Electronic Medical Record;Deep Residual Optimization for Stereoscopic Image Color Correction;Old Man Fall Detection Based on Surveillance Video Object Tracking;Electric Bicycle Violation Automatic Detection in Unconstrained Scenarios;Building a Lightweight Container-Based Experimental Platform for HPC Education;Automatic Generation and Assessment of Student Assignments for parallel Programming Learning;Heuristic Load Scheduling Algorithm for Stateful Cloud BPM Engine.
In this paper, we present a highly parallel and area-efficient constant-time inversion algorithm over the r-th degree polynomial ring, derived from Schroeppel's Almost Inverse algorithm. We propose a first constan...
详细信息
ISBN:
(纸本)9781728172019
In this paper, we present a highly parallel and area-efficient constant-time inversion algorithm over the r-th degree polynomial ring, derived from Schroeppel's Almost Inverse algorithm. We propose a first constant time version, from which we derive a highly-parallel and a faster algorithm, while still preserving the constant-time property. this constitutes an alternative and relatively unexplored approach to inversion, compared to the more common multiplicative approach by Itoh and Tsuji, and has extensive application in algorithms such as the BIKE proposal for quantum-resistant cryptography. Our approach is extremely area-efficient, with a constant area with respect to the polynomial degree r.
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, ...
详细信息
ISBN:
(纸本)9781467387767
In this paper a parallel algorithm for branch and bound applications is proposed. the algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, that is written in a certain format. It is a distributed dynamic scheduling algorithm, i.e. each node schedules the load of its cores, it can be used with different programming platforms and architectures and is a hybrid algorithm (OpenMP, MPI). To prove its validity and efficiency the proposed algorithm has been implemented and tested with numerous examples in this paper that are described in detail. A speed-up of about 9 has been achieved for the tested examples, for a cluster of three nodes with four cores each.
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
In this paper we consider the educational and research systems that can be used to estimate the efficiency of parallel computing. ParaLab allows parallel computation methods to be studies. Withthe ParaLib library, we can compare the parallel programming languages and technologies. the Globalizer Lab system is capable of estimating the efficiency of algorithms for solving computationally intensive global optimization problems. these systems can build models of various high-performance systems, formulate the problems to be solved, perform computational experiments in the simulation mode and analyze the results. the crucial matter is that the described systems support a visual representation of the parallel computation process. If combined, these systems can be useful for developing high-performance parallel programs which take the specific features of modern supercomputing systems into account.
the deluge of genomics data is incurring prohibitively high computational costs. As an important building block for genomic data processingalgorithms, FM-index search occupies most of execution time in sequence align...
详细信息
ISBN:
(纸本)9781450365109
the deluge of genomics data is incurring prohibitively high computational costs. As an important building block for genomic data processingalgorithms, FM-index search occupies most of execution time in sequence alignment. Due to massive random streaming memory references relative to only small amount of computations, FM-index search algorithm exhibits extremely low efficiency on conventional architectures. this paper proposes Niubility, an accelerator for FM-index search in genomic sequence alignment. Based on our algorithm-architecture co-design analysis, we found that conventional architectures exploit low memory-level parallelism so that the available memory bandwidth cannot be fully utilized. Niubility accelerator customizes bit-wise operations and exploit data-level parallelism, that produces maximal concurrent memory accesses to saturate memory bandwidth. We implement an accelerator ASIC in a ST 28nm process that achieves up to 990x speedup over the state-of-the-art software.
暂无评论