In this paper, we present two parallel LMS adaptive filtering algorithms with low hardware. The proposed parallel algorithm 1 doesn't alter the input-output behavior and saves large amount of hardware cost of prev...
详细信息
In this paper, we present two parallel LMS adaptive filtering algorithms with low hardware. The proposed parallel algorithm 1 doesn't alter the input-output behavior and saves large amount of hardware cost of previous designs, especially when the parallelism level is high. For example, it saves 68.4% of the multiplications and 4.7% of the additions, of those of prior fast parallel adaptive filtering algorithms when parallelism level is 72 and the filter length N is large. The proposed parallel algorithm 2, while maintaining the same performance, can further save 5.56% to 12.5% of the multipliers and 8.54% to 24.9% of the additions when the level of parallelism varies from 3 to 72
This paper presents a new algorithm for parallel solution of security constrained optimal power flow (SCOPF) with a nonlinear interior point method (IPM). In the parallel algorithm proposed in this paper, the SCOPF pr...
详细信息
This paper presents a new algorithm for parallel solution of security constrained optimal power flow (SCOPF) with a nonlinear interior point method (IPM). In the parallel algorithm proposed in this paper, the SCOPF problem is easily decomposed and distributed to a set of processors where the computations are done independently on a subset of the contingency states. Moreover, a full AC model is used in this parallel algorithm. The new parallel algorithm has been tested for accuracy and speedup with two systems: one with 57 buses, 80 branches and 9 contingencies, another with 3493 buses, 6689 branches and 79 contingencies. With less than 1% relative error and more than a factor of 10 speedup, the test results demonstrate that this new parallel algorithm with IPM is accurate and efficient for large-scale SCOPF problems.
This paper evaluates the performance of a novel view-oriented parallel programming style for parallel programming on cluster computers. View-oriented parallel programming is based on distributed shared memory which is...
详细信息
This paper evaluates the performance of a novel view-oriented parallel programming style for parallel programming on cluster computers. View-oriented parallel programming is based on distributed shared memory which is friendly and easy for programmers to use. It requires the programmer to divide shared data into views according to the memory access pattern of the parallel algorithm. One of the advantages of this programming style is that it offers the performance potential for the underlying distributed shared memory system to optimize consistency maintenance. Also it allows the programmer to participate in performance optimization of a program through wise partitioning of the shared data into views. Experimental results demonstrate a significant performance gain of the programs based on the view-oriented parallel programming style.
This paper reports on the development of a novel island model for evolutionary algorithms, which is intrinsically parallel and intended to better utilise resources and outlier solutions encountered during search. Outl...
详细信息
This paper reports on the development of a novel island model for evolutionary algorithms, which is intrinsically parallel and intended to better utilise resources and outlier solutions encountered during search. Outliers serve as seeds for new islands using a similar evolutionary algorithm or a local search procedure. In this initial study, we examine a definition of outliers and demonstrate the ability to obtain improvements using outliers and a simple local search method.
In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method. Methods of calculation of all deadlocks and traps in Petri nets are very time consum...
详细信息
In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method. Methods of calculation of all deadlocks and traps in Petri nets are very time consuming. Therefore it is very important to optimize a computation. The parallel computation method for the time reduction is proposed. Experimental results of presented method are discussed, as well
This paper is to model the road situation with Floating Car Data (FCD) based on principle curves. The relationship of the time and the passing velocity through a road segment is presented by the Principle Curves (PCs)...
详细信息
This paper is to model the road situation with Floating Car Data (FCD) based on principle curves. The relationship of the time and the passing velocity through a road segment is presented by the Principle Curves (PCs) which can reflect urban traffic situation. And the rules of road situation are built by PCs. Moreover, a parallel algorithm is proposed to build the models because of massive GPS data and the large road net. The task scheduling policy of the algorithm can assign tasks of different size to processors with different speed such that all subtasks complete at the same time in theory. The experimental results indicate that the increasing multiple of the speedup is about equal to the increasing multiple of the processors, and that the efficiency of each processor is about 93% in the heterogeneous computing platform with the algorithm
It is more and more important in data mining field to finding the frequent sequences in a large database. The paper briefly introduces the basic concept of frequent sequence mining and presents the data parallel formu...
详细信息
It is more and more important in data mining field to finding the frequent sequences in a large database. The paper briefly introduces the basic concept of frequent sequence mining and presents the data parallel formulation and task parallel formulation of tree-projection based algorithm. Moreover, the on-line LPT algorithm is used to successfully solve the problem of imbalance for the task parallel formulation. Our experiment shows that these algorithms are capable of achieving good speedups. However, the task parallel formulation is more scalable than the data parallel one.
For JPEG2000 real-time applications, embedded block coding with optimized truncation (EBCOT) is a time-consuming part and becoming a bottleneck for the entire system throughput. Since the arithmetic encoder (AE) is on...
详细信息
For JPEG2000 real-time applications, embedded block coding with optimized truncation (EBCOT) is a time-consuming part and becoming a bottleneck for the entire system throughput. Since the arithmetic encoder (AE) is one part of EBCOT, low performance of AE can significantly degrade the performance. AE is inherently a serial process with high dependency and parallelization of AE is difficult. In this paper, a partial parallel algorithm for AE is proposed. One distinct characteristic of the proposed algorithm is that two contexts can be processed in one clock cycle. Based on the proposed algorithm, a pipelined architecture is implemented. Experimental results, with standard test image benchmarks, show that the proposed algorithm and architecture achieves about 24% improvement in the system throughput in comparison with the architecture based on the original AE algorithm.
In this paper, a parallel algorithm for computing the roots of a given polynomial of degree n on a ring of processors is proposed. The algorithm implements Durand-Kerner's method and consists of two phases: initia...
详细信息
In this paper, a parallel algorithm for computing the roots of a given polynomial of degree n on a ring of processors is proposed. The algorithm implements Durand-Kerner's method and consists of two phases: initialization, and iteration. In the initialization phase all the necessary preparation steps are realized to start the parallel computation. It includes register initialization and initial approximation of roots requiring 3n-2 communications, 2 exponentiation, one multiplications, 6 divisions, and 4n-3 additions. In the iteration phase, these initial approximated roots are corrected repeatedly and converge to their accurate values. The iteration phase is composed of some iteration steps, each consisting of 3n communications, 4n+3 additions, 3n+1 multiplications, and one division.
The low cost and wide availability of PC-based clusters have made them an excellent alternative to access supercomputing. However, while network of workstations may be readily available, there is an increasing need fo...
详细信息
ISBN:
(纸本)0769522491
The low cost and wide availability of PC-based clusters have made them an excellent alternative to access supercomputing. However, while network of workstations may be readily available, there is an increasing need for performance tools that support these platforms, in order to achieve even higher performance. One of possible ways to increase performance is parallel program restructuring. It is introduced in this paper a toolkit to generate graphical charts for visualization of MPI parallel programs, reflecting to its execution over time, with the use of DP*Graph representation, parallel version of timing graph. In other words, parallel programs are shown through charts its sequential codes, dependencies and communication structures in a particular cluster system platform. Still in this paper, it is discussed the implementation of this toolkit and present some experimental results obtained.
暂无评论