Withthe dropping prices of multiprocessor desktop computers and high-performance clusters these systems are more and more available to the average user. thus there is a growing need for applications that take advanta...
详细信息
Withthe dropping prices of multiprocessor desktop computers and high-performance clusters these systems are more and more available to the average user. thus there is a growing need for applications that take advantage of multiprocessor systems. Automatic parallelization allows preexisting sequential programs to utilize multiple processors. Automatic parallelization requires thorough analysis of the sequential program. In this paper multiprocessor systems, and challenges of automatic parallelization are briefly covered. Also, a framework for developing and running automatic parallelization algorithms is introduced.
A large variety of problems that are out of reach of single processor computer capabilities. Many approaches are offered today to get round this. Each of these has its own strengths and weaknesses : a compromise has t...
详细信息
ISBN:
(纸本)3540406735
A large variety of problems that are out of reach of single processor computer capabilities. Many approaches are offered today to get round this. Each of these has its own strengths and weaknesses : a compromise has to be found. We will introduce a general parallel computing method for engineering problems dedicated to all users. We have searched an easy method for code development. A technique of data selection (Selected Data Technique - SDT) is used for the determination of the data dedicated to each processor. Several problems associated withthe communication times are posed and solutions are proposed in accordance withthe number of processors. this method is applied to very large CPU cost problems, particularly the unsteady problems or steady problems using an iterative method. So the domain of potential applications is very wide. the SDT-parallelization is performed by an expert system called AMS (Automatic Multi-grid System) included in the software. this new concept is a natural way for the standardization of parallel codes. An example is presented hereafter.
Partial reconfiguration has opened the door to efficient implementation of large applications on area constrained hardware. It requires a divide and mapping technique through which large applications are divided and m...
详细信息
Partial reconfiguration has opened the door to efficient implementation of large applications on area constrained hardware. It requires a divide and mapping technique through which large applications are divided and mapped on reconfigurable hardware. A technique is proposed for dividing the application taking into account implementation and architectural constraints on hardware processing elements which are swappable. Each time a new task is mapped, an objective function considers mapping of the new task on existing PE or loading a new configuration bit stream for an optimized PE. the decision is critical because it can minimize configuration time at the cost of execution time.
In this paper, an FPGA implementation of a novel and highly scalable hardware architecture for fast inversion of triangular matrices is presented. An integral part of modem signal processing and communications applica...
详细信息
In this paper, an FPGA implementation of a novel and highly scalable hardware architecture for fast inversion of triangular matrices is presented. An integral part of modem signal processing and communications applications involves manipulation of large matrices. therefore, scalable and flexible hardware architectures are increasingly sought for. In this paper, the traditional triangular shaped array architecture with n(n+l)/2 communicating processors, with n being the number of inputs, is mapped to a linear structure with only n processors. the linear and the triangular shaped architectures are compared in aspect of area consumption, latencies, and maximum clocking speed. this paper also show that the linear array structure avoids drawbacks such as non-scalability, large area, and large power consumption. the implementation is based on a numerically stable recurrence algorithm, which has excellent properties for hardware implementation.
this paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as...
详细信息
ISBN:
(纸本)9781581137422
this paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole digital video broadcasting base-band receiver as well, have been mapped onto the second architecture with impressive performance. the mapping of a Reed-Solomon decoder proposed in the paper highly parallelizes all of its sub-algorithms, including Syndrome Computation, Berlekamp Algorithm, Chein Search, and Error Value Computation, in a SIMD fashion. the mapping is tested on a cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures. the decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319 Gbps and 2.534 Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting instruction level parallelism from streamed applications.
We have been investigating the determination of three-dimensional electric field distributions using an original method. We have already reported on measurements of symmetrical and nonsymmetrical electric field vector...
详细信息
We have been investigating the determination of three-dimensional electric field distributions using an original method. We have already reported on measurements of symmetrical and nonsymmetrical electric field vector distributions. this reconstruction technique gives electric field map, i.e., the strength and the direction of electric field vector, on any plane. If electric field distribution was obtained continuously with time, the measurement of electric field distribution is more useful. So, we propose continuous measurements of electric field vector distribution in a liquid insulator. the measurement system is called simultaneous three-directional optical measurement system withparallelprocessing. the term of "parallelprocessing" means that the signal for each image sensor was processed simultaneously. By this system, the electric field distribution was measured at intervals of 1 ms when a voltage pulse was applied to an electrode system.
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declu...
ISBN:
(纸本)0769515126
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declustering data layout for tolerating dependent disk failures in network raid systems;an analysis of update ordering in a cluster of replicated servers;performance of dynamic load balancing algorithm on cluster of workstations and PCs;universal parallel numerical computing for 3d convection-diffusion equation with variable coefficients;efficient loop partitioning for parallel codes of irregular scientific computations;an evolutionary algorithm of contracting search space based on partial ordering relation for constrained optimization problems;a new divide and conquer algorithm for real symmetric band generalized eigenvalue problem;a framework of using cooperating mobile agents to achieve load sharing in distributed web server groups;and design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation.
parallel distributed computing systems provide mechanisms for exploiting parallelism inherent in many scientific and engineering applications. One such programming environment that has successfully demonstrated operat...
详细信息
parallel distributed computing systems provide mechanisms for exploiting parallelism inherent in many scientific and engineering applications. One such programming environment that has successfully demonstrated operation on a collection of heterogeneous computing elements incorporated by one or more networks is the parallel virtual machine (PVM). It has been used on high end computing resources such as mainframe computers, multiprocessors, hypercubes, and the like. In Pakistan, the most common computing resource is a low cost PC. the abundance of such machines provides an opportunity to develop and use a "poor man's supercomputer". In addition, research on PVM has focused on Unix or similar platforms. None of the formal results, to evaluate certain benchmark applications, are available on Windows-based environments. the work reports the results of the local PVM implementation and compares them with results from conventional implementations of PVM.
Describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture under special...
详细信息
Describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture under special consideration of on-chip communications. A constructive algorithm is presented that extends previous work by taking into account potential data transfers in the future. the second approach is a recursive procedure that is based on local search techniques in a specially defined neighborhood of the critical path. Simulated annealing and tabu search are used as search algorithms. Both approaches find solutions with better performance than established methodologies. the recursive technique leads to superior results than the constructive approach, however, is limited to small and mid-sized problems, whereas the constructive algorithm is not limited by this issue.
暂无评论