As FPGAs have increasingly become denser and faster, they are being utilized for many applications, including the implementation of neural networks. Ideally, FPGA implementations, being directly in hardware and having...
详细信息
As FPGAs have increasingly become denser and faster, they are being utilized for many applications, including the implementation of neural networks. Ideally, FPGA implementations, being directly in hardware and having parallelism, will have performance advantages over software on conventional machines. But there is a great deal to be done to make the most of FPGAs and to prove their worth in implementing neural networks, especially in view of past failures in the implementation of neurocomputers. this paper looks at some of the relevant issues.
Popular secure public key cryptosystems such as RSA and Diffie-Hellman are based on hard problems like factorization and discrete logarithms. these systems often require large prime numbers the size of 300 decimal dig...
详细信息
Popular secure public key cryptosystems such as RSA and Diffie-Hellman are based on hard problems like factorization and discrete logarithms. these systems often require large prime numbers the size of 300 decimal digits long for the systems to be secure. the generation of large prime numbers is difficult, and larger prime numbers will be required as advances in parallel computing makes factorization of large numbers faster. In this paper, algorithms for digital signatures and public key cryptosystems using multilayer perceptrons (MLPs) are proposed. the security of the algorithm is based on the difficult problem of solving non-linear simultaneous equations. Instead of needing large prime numbers, the algorithm requires multiple real numbers that can be easily generated.
the folding and unfolding techniques cannot be used to design pipelined digit adders because of the presence of feedback loops. In this paper, approaches for the design of digit multipliers that can be pipelined to th...
详细信息
the folding and unfolding techniques cannot be used to design pipelined digit adders because of the presence of feedback loops. In this paper, approaches for the design of digit multipliers that can be pipelined to the bit level are presented. It includes architectures obtained via unfolding, the high radix approach and the multi-pipe approach. the pipelining of these architectures has been made possible thanks to a new "pipelined digit adder". the presented architectures are scalable, systolic and can offer a great flexibility in finding the best trade-off between hardware cost and throughput rate by changing the level of pipelining and the digit size.
We propose the hardware-oriented genetic algorithm processor with efficient exploration based on a subpopulation architecture for high-performance convergence and reducing computation time. We applied the steady-state...
详细信息
ISBN:
(纸本)9810475241
We propose the hardware-oriented genetic algorithm processor with efficient exploration based on a subpopulation architecture for high-performance convergence and reducing computation time. We applied the steady-state model among continuous generation model, modified tournament selection, special survival condition and the parallelism of coarse-grain to our proposed GAP. In addition, the crossover operator selection method with respect to the convergence state of each subpopulation was newly employed. In order to implement the efficient hardware structure, the pipelined structure was used. the proposed GAP is implemented on the AGENT2000 board with EFP10K200SRC device.
We suggest a new approach to the analysis of Petri nets. It consists of extracting pairs of independent (parallel) transitions, constructing the set of auxiliary objects of those pairs, and treating it on the basis of...
详细信息
We suggest a new approach to the analysis of Petri nets. It consists of extracting pairs of independent (parallel) transitions, constructing the set of auxiliary objects of those pairs, and treating it on the basis of natural and obvious rules. this allows one to extract all possible scenarios in the behavior of Petri nets and to evaluate their probability-like characteristics in the case of several scenarios.
this paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube...
详细信息
ISBN:
(数字)9783540445036
ISBN:
(纸本)9783540414568
this paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. this allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. this supports the transfer of optimized sequential data cube code to a parallel setting. the bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. the top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. We have implemented our parallel top-down data cube construction method in C++ withthe MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. the tests show that our partitioning strategies generate a close to optimal load balance between processors. the actual run times observed show an optimal speedup of p.
Since neural networks (NNs) require an enormous amount of learning time, various kinds of dedicated parallel computers have been developed. In the paper a 2-D systolic array (SA) of dedicated processing elements (PEs)...
详细信息
Since neural networks (NNs) require an enormous amount of learning time, various kinds of dedicated parallel computers have been developed. In the paper a 2-D systolic array (SA) of dedicated processing elements (PEs) also called systolic cells (SCs) is presented as the heart of a multimodel neural-network accelerator. the instruction set of the SA allows the implementation of several neural algorithms, including error back propagation and a self organizing feature map algorithm. Several special architectural facilities are presented in the paper in order to improve the 2-D SA performance. A swapping mechanism of the weight matrix allows the implementation of NNs larger than 2-D SA. A systolically propagated instruction word accompanying each input vector inside the 2-D SA allows the operating mode to be changed progressively, avoiding intermediate inactive cycles inside the 2-D SA. An FPGA implementation of the proposed 2-D SA is presented.
Huang, et al. (1996, 2002) proposed architecture selection algorithm called SEDNN to find the minimum architectures for feedforward neural networks based on the Golden section search method and the upper bounds on the...
详细信息
ISBN:
(纸本)9810475241
Huang, et al. (1996, 2002) proposed architecture selection algorithm called SEDNN to find the minimum architectures for feedforward neural networks based on the Golden section search method and the upper bounds on the number of hidden neurons, as stated in Huang (2002) and Huang et al. (1998), to be 2/spl radic/((m + 2)N) or two layered feedforward network (TLFN) and N for single layer feedforward network (SLFN) where N is the number of training samples and m is the number of output neurons. the SEDNN algorithm worked well withthe assumption that time allowed for the execution of the algorithm is infinite. this paper proposed an algorithm similar to the SEDNN, but with an added time factor to cater for applications that requires results within a specified period of time.
As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performanc...
详细信息
Query cost models are widely used, both for performance analysis and for comparing execution plans during query *** essence, a cost model predicts where time is being spent during query evaluation. Although many cost ...
详细信息
暂无评论