this paper presents the results from running five experiments withthe Chime parallelprocessing System. the Chime System is an implementation of the CC++ programming language (parallel part) on a network of computers...
详细信息
Conflicts between jobs for shared resources happen daily on the manufacturing floor of a job shop production facility: this makes the management of a manufacturing floor rather difficult. In order to solve this proble...
ISBN:
(纸本)0769505716
Conflicts between jobs for shared resources happen daily on the manufacturing floor of a job shop production facility: this makes the management of a manufacturing floor rather difficult. In order to solve this problem, we are proposing a system to plan and manage scheduling through the use of multi-PERT. We call this system the Concurrent Scheduling Method (CShl). Under CSM, the job structure of each individual order is clarified during the design stage to determine the production process. the result is detailed in CSM process charts similar to PERT diagrams. these multiplexed job structures are monitored in order to follow the progress. And the schedule for all orders is planned and managed. A schedule planned in this way is able to have the resource utilization maximized fully and the critical path identified clearly. the information regarding the critical path helps to solve the load problem. In this way, the production process of the site is optimized for keeping delivery date. Also, status changes caused by new orders, delivery date change requests or progress changes carl be responded to quickly.
Sequential multi-constraint graph partitioners have been developed to address the load balancing requirements of multi-phase simulations. the efficient execution of large multi-phase simulations on high performance pa...
详细信息
ISBN:
(纸本)3540679561
Sequential multi-constraint graph partitioners have been developed to address the load balancing requirements of multi-phase simulations. the efficient execution of large multi-phase simulations on high performance parallel computers requires that the multi-constraint partitionings are computed in parallel. this paper presents a parallel formulation of a recently developed multi-constraint graph partitioning algorithm. We describe this algorithm and give experimental results conducted on a 128-processor Cray T3E. We show that our parallel algorithm is able to efficiently compute partitionings of similar edge-cuts as serial multi-constraint algorithms, and can scale to very large graphs. Our parallel multi-constraint graph partitioner is able to compute a three-constraint 128-way partitioning of a 7.5 million node graph in about 7 seconds on 128 processors of a Cray T3E.
the need for processing speed in digital imaging and multidimensional signal processing seems to grow ever faster as more and more power hungry applications find their way into scientific and engineering applications....
详细信息
ISBN:
(纸本)0780365429
the need for processing speed in digital imaging and multidimensional signal processing seems to grow ever faster as more and more power hungry applications find their way into scientific and engineering applications. In this paper we evaluate the performance of a high-speed parallel system aiming at real time applications. the parallel system utilises a 2-D hybrid decomposition algorithm that eliminates the problem of overlapping segments in the block convolution and the boundary conditions when paralleling a 2-D filter algorithm. Finally the implementation on parallel SHARC DSPs is investigated and some examples are given.
A tool for software implementation of digital filter architectures is presented. the implementation is based on fixed-point arithmetic using bit-level logic modules to reflect the actual hardware. the tool can be used...
详细信息
ISBN:
(纸本)0780365429
A tool for software implementation of digital filter architectures is presented. the implementation is based on fixed-point arithmetic using bit-level logic modules to reflect the actual hardware. the tool can be used in academia as well as for hardware verification. Several finite-duration impulse response (FIR) and infinite duration impulse response (IIR) filter architectures are implemented to illustrate the capabilities of the tool.
third generation mobile radio systems will employ TD-CDMA in their TDD mode. To increase the capacity and performance of this system, the receiver will contain a joint detector. Joint detection is equivalent to solvin...
详细信息
third generation mobile radio systems will employ TD-CDMA in their TDD mode. To increase the capacity and performance of this system, the receiver will contain a joint detector. Joint detection is equivalent to solving a least squares problem, which represents a significant computational effort because of the amount of data that is involved. therefore, algorithms and implementations must be developed that lower this complexity as much as possible without degrading the performance of the joint detector. this paper presents an algorithm that is based on the idea of extending the system matrix of the least squares problem to a block-circulant matrix. It is then possible to blockdiagonalize the matrix by Fast Fourier Transforms. In addition, overlap-save techniques are presented that reduce the computational complexity further. the resulting algorithm is well suited for the implementation on parallelarchitectures. It has a lower computational complexity than existing methods while yielding a better bit error ratio performance.
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implemen...
详细信息
ISBN:
(纸本)0780365429
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implementing this new architecture enhances the performance of computations. the proposed multi-elementary processor architecture of 2D-WT yields a very flexible hardware configuration. this approach offers a high processing speed, relative to other methods, for providing the wavelet coefficients. the 2D-WT is a powerful tool for several applications, the most important one being image processing.
Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tre...
详细信息
ISBN:
(纸本)9643600572
Comparison of five different 32-bit integer multipliers is done for various performance measures. Multipliers included in comparison are the array multiplier, modified Booth (radix-4) multiplier, optimized Wallace tree multiplier, combined modified Booth-Wallace tree multiplier and twin pipe serial parallel multiplier. Comparison is based on synthesis results obtained by synthesizing all multiplier architectures towards FPGA.
Typical DSP algorithms require more memory bandwidth. thus unibus shared-memory systems can support only a handful of processors. the proposed architecture can effectively support 64 digital signal processors (DSPs) i...
详细信息
ISBN:
(纸本)0780365429
Typical DSP algorithms require more memory bandwidth. thus unibus shared-memory systems can support only a handful of processors. the proposed architecture can effectively support 64 digital signal processors (DSPs) in contrast to a maximum of 4 DSPs supported by existing bus-interconnected systems. this significant enhancement is achieved by introducing two small programmable fast memories (Twins) between the processor and the shared bus interconnect. While one memory is transferring data from/to the shared memory, the other is supplying the core processor with data. the proposed architecture eliminates the traditional direct linkage of the shared-bus and processor data bus; thus making feasible the utilization of a wider shared bus. Simulation results show that: the fast prefetching memories and the wider shared bus provide additional bus bandwidth to the system, which eliminates large memory latencies; such memory latencies constitute the major drawback for the performance of shared-memory multiprocessors.
暂无评论