Two hardware architectures are developed via an improved parameterized efficient FPGA implementation method for parallel 1-D real-time signal filtering algorithms to provide higher performance per Watt and minimum log...
详细信息
In this paper we present a parallel library treated as a high-level interface for solving nonlinear systems. This work follows the guidelines of a high-level interface in terms of usability, but in fact it is a librar...
详细信息
ISBN:
(纸本)9781905088416
In this paper we present a parallel library treated as a high-level interface for solving nonlinear systems. This work follows the guidelines of a high-level interface in terms of usability, but in fact it is a library developed using a mixed model in relation to the utilization of different programming languages. In order to create the high-level interfaces, we have chosen the Python language. On the other hand, the developed Fortran routines offer all the performance of the low-level language. The developed library, PyPANCG, consists of two modules, PySParNLCG and PySParNLPCG. The PySparNLCG module parallelizes the conjugate gradient method for solving the nonlinear system Ax = phi(x), and the PySParNLPCG module implements the preconditioning technique based on block two-stage methods. Experimental results report the numerical accuracy and the parallel performance of our approach on different parallel computers.
The proceedings contain 30 papers. The topics discussed include: flexible error protection for energy efficient reliable architectures;characterizing energy consumption in hardware transactional memory systems;perform...
ISBN:
(纸本)9780769542164
The proceedings contain 30 papers. The topics discussed include: flexible error protection for energy efficient reliable architectures;characterizing energy consumption in hardware transactional memory systems;performance debugging of GPGPU applications with the divergence map;mixed-precision parallel linear programming solver;mapping pipelined applications with replication to increase throughput and reliability;improving in-memory column-store database predicate evaluation performance on multi-core systems;a comparative analysis of load balancing algorithms applied to a weather forecast model;sharing resources for performance and energy optimization of concurrent streaming applications;a cache replacement policy using adaptive insertion and re-reference prediction;the dynamic block remapping cache;towards a peer-to-peer framework for parallel and distributed computing;and on the worst case of scheduling with task replication on computational grids.
FFT is a widely used algorithm, of which parallelization is a very important topic. There were a lot of works for this field and many parallelalgorithms were published in several decades. In this paper, an algorithm ...
详细信息
FFT is a widely used algorithm, of which parallelization is a very important topic. There were a lot of works for this field and many parallelalgorithms were published in several decades. In this paper, an algorithm named Computation Oriented parallel FFT (COPF) is proposed. COPF which dates from the classic parallel radix-2 FFT focuses on the butterfly structure in FFT and adopts a proper strategy in parallel phrase on distributed system. In the serial phrase, COPF takes FFTW3 to accelerate the serial process and extends the application.
This paper presents a parallel refined Jacobi-Davidson method for computing extreme eigenpairs of quadratic eigenvalue problems. The method directly computes the refined Ritz pairs in the projection subspace, and expa...
详细信息
This paper presents a parallel refined Jacobi-Davidson method for computing extreme eigenpairs of quadratic eigenvalue problems. The method directly computes the refined Ritz pairs in the projection subspace, and expands the subspace by the solution of the correction equation. Combining with the restarting scheme, the method can solve several eigenpairs of quadratic eigenvalue problems. The numerical experiments on a parallel computer show that the parallel refined Jacobi-Davidson method for computing quadratic eigenvalue problems is very effective.
By taking into account communication startup overhead and the assigned processor distribution order and by applying hashing technique, a novel sequence distribution strategy is presented and the parallel local alignme...
详细信息
By taking into account communication startup overhead and the assigned processor distribution order and by applying hashing technique, a novel sequence distribution strategy is presented and the parallel local alignment algorithm for multiple sequences is designed on the heterogeneous cluster system that the computing nodes have different computing speeds and communication capabilities based on divisible load principle. The experimental results on the cluster system with heterogeneous personal computers show that, compared with the parallel algorithm with the average sequence distribution approach, the parallel local alignment algorithm for multiple sequences with the presented sequence distribution strategy can decrease the execution time of 13%~35%, and it can obtain good speedup and scalability.
Cryptosystem on conic curves, which is a new developing cryptography, becomes more widespread in these days. It is important to explore fast parallelalgorithms to both encrypt and decrypt information in conic curves ...
详细信息
Cryptosystem on conic curves, which is a new developing cryptography, becomes more widespread in these days. It is important to explore fast parallelalgorithms to both encrypt and decrypt information in conic curves cryptosystem. Point-multiplication is the key operation for constructing security protocol in conic curves cryptosystem. There is no existing research focused on paralleling point-multiplication for conic curves cryptosystem. This paper presents parallel computation of point-multiplication for conic curves cryptosystem over finite field Fp and ring Zn. Research in this paper is based on our previous works about several parallelalgorithms for conic curves cryptosystem. The parallel technique of point-multiplication is computing point-addition and point-double respectively. The performance evaluation demonstrates that our methodology could improve efficiency for conic curves cryptosystem over finite field Fp and ring Zn.
This paper introduces a new type of parallel computer based on N+1 programs (hereinafter, N+1 computer), as well as its features. A new concept of parallel computing architecture based on N+1 programs is also presente...
详细信息
This paper introduces a new type of parallel computer based on N+1 programs (hereinafter, N+1 computer), as well as its features. A new concept of parallel computing architecture based on N+1 programs is also presented at same time. We studied the essential problems and inherent laws on parallel systems, and analyzed the operate ability, the computability, and the compilability of the system, as well as its generality of this new parallel computing architecture. Finally, an N+1 computer prototype machine was trialed which could be considered to be a new general type of platform on parallel processing studies.
As the proportion of video programs is expected to grow significantly, video services will require a huge amount of Internet bandwidth in the future. In this paper, we model the video program placement (VPP) problem i...
详细信息
As the proportion of video programs is expected to grow significantly, video services will require a huge amount of Internet bandwidth in the future. In this paper, we model the video program placement (VPP) problem in tree networks which sends video programs to the requesting (demand) nodes using a broadcast method. The model considers the cost for both assigning programs to nodes and broadcasting video programs through links. The model is formulated as an integer program, and its objective is to minimize the total cost of VPP in a tree network. We develop a dynamic programming algorithm to solve this problem with time complexity O(NP) where N is the number of nodes and P is the number of video programs.
This paper proposes a fast reconfiguration algorithm for the two-dimensional degradable mesh-connected processor arrays. The proposed algorithm simplifies a dynamic programming approach to construct logical columns. F...
详细信息
This paper proposes a fast reconfiguration algorithm for the two-dimensional degradable mesh-connected processor arrays. The proposed algorithm simplifies a dynamic programming approach to construct logical columns. For each processing element lying in the logical columns, the calculation is reduced from five operations (one assignment, two additions and two comparisons) that are taken in the state-of-the-art to single assignment operation in most cases, or three operations (one assignment, one comparison and one addition) in worst case. Simulation results based on same benchmarks utilized in the state-of-the-art show that, the simplified algorithm runs faster by 28%, without loss of harvest. Moreover, the increase of the total interconnection length of the target array is acceptable.
暂无评论