Frequency analysis using DFT (discrete Fourier transform) or its faster computational technique (FFT) is an obvious choice for the entire image and signal processing domain where spectral leakage or picket fence effec...
详细信息
Frequency analysis using DFT (discrete Fourier transform) or its faster computational technique (FFT) is an obvious choice for the entire image and signal processing domain where spectral leakage or picket fence effect is a major problem. Earlier works describe the software and ROM-based implementation of windowing functions to overcome the above-mentioned problems during spectral analysis. In this work we have proposed a CORDIC (co-ordinate rotation digital computer)-based unified windowing architecture to remove the spectral leakage, picket fence effect and resolution problems with different tradeoff between mainlobe and sidelobe in the frequency domain. A parallel-pipelined architecture has been adopted for the present design to ensure high throughput for real-time applications with the latency equal to twice of CORDIC length plus three extra cycles. This unified architecture includes a combination of linear CORDIC and circular CORDIC with FIFO and a few multiplexers where the selection of window and its length are user defined. We have synthesised this architecture with 0.18 mu m CMOS technology using Synopsys Design Analyser. The total estimated dynamic power was found to be 350 mW with an operating frequency of 125 MHz and total cell area 11 mm(2) (approximately).
Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of t...
详细信息
Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of them can easily be integrated with the hosts' virtual memory systems. The merits of distributed shared memory and the assumptions made with respect to the environment in which the shared memory algorithms are executed are described. The algorithms are then described, and a comparative analysis of their performance in relation to application-level access behavior is presented. It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications. Two particularly interesting extensions of the basic algorithms are described, and some limitations of distributed shared memory are noted
This paper investigates various processor management techniques for improving the performance of mesh-connected multicomputers. Unlike almost all prior work where the focus was on improving the submesh recognition abi...
详细信息
This paper investigates various processor management techniques for improving the performance of mesh-connected multicomputers. Unlike almost all prior work where the focus was on improving the submesh recognition ability of the processor allocation algorithms, this research examines other alternatives to improve system performance beyond what is achievable with usually assumed first come first served (FCFS) scheduling and any allocation. First, we use the smallest job first (SJF) policy to improve the spatial parallelism in a mesh. Next. we introduce a generic processor management scheme called multitasking and multiprogramming (M(2)). Then, an M(2) policy for mesh-connected multicomputers called virtual mesh (VM) is proposed and analyzed. The proposed VM scheme allows multiprogramming of jobs on several VMs. Finally, a novel approach called limit allocation is used for job allocation. With this scheme, a job (submesh) size is reduced if the job cannot be allocated. The objective here is to reduce the job waiting time and hence improve the overall performance. While all of the three approaches are viable alternatives to reduce the average job response time under various workloads, the VM and the limit allocation techniques are especially attractive for providing some additional features. The VM scheme brings in the concept of time-sharing execution for better efficiency and limit allocation shows how job size restriction can be beneficial for performance and fault-tolerance in a mesh topology. Moreover, the limit allocation scheme using even the simplest allocation policy can outperform any other approach. (C) 2001 Published by Elsevier Science B.V.
The applications of discrete-time signal-processing techniques, such as windowing and filtering for the purpose of implementing accurate excitation schemes in the finite-difference time-domain (FDTD) method are demons...
详细信息
The applications of discrete-time signal-processing techniques, such as windowing and filtering for the purpose of implementing accurate excitation schemes in the finite-difference time-domain (FDTD) method are demonstrated. The effects of smoothing windows of various lengths and digital lowpass filters of various bandwidths and characteristics are investigated on finite-source excitations of the FDTD computational domain. Both single-frequency sinusoidal signals and multifrequency arbitrary signals are considered.
This article describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery using message t...
详细信息
This article describes a protocol for a general-purpose cluster communication system that supports multiprogramming with virtual networks, direct and protected network access, reliable message delivery using message time-outs and retransmissions, a powerful return-to-send error model for applications, and automatic network mapping. The protocols use simple, low-cost mechanisms that exploit properties of our interconnect without limiting flexibility, usability, or robustness. We have implemented the protocols in an active message communication system that runs a network of 100+ Sun UltraSPARC workstations interconnected with 40 Myrinet switches. A progression of microbenchmarks demonstrate good performance - 42 microsecond round-trip times and 31 MB/s node-to-node bandwidth - as well as scalability under heavy load and graceful performance degradation in the presence of high contention.
In the classical scheduling theory it is widely assumed that any task requires for its processing only one processor at a time. In this paper the problem of deterministic scheduling of tasks requiring for their proces...
详细信息
In the classical scheduling theory it is widely assumed that any task requires for its processing only one processor at a time. In this paper the problem of deterministic scheduling of tasks requiring for their processing more than one processor at a time, i.e., a constant set of dedicated processors, is analyzed. Schedule length is assumed to be a performance measure. Tasks are assumed to be preemptable and independent. Low order polynomial algorithms for simple cases of the problem are given. Then a method to solve the general version of the problem for a limited number of processors is presented, while the case of an arbitrary number of processors is known to be NP-hard. Finally, a version of the problem, where besides processors every task can also require additional resources, is considered.
Parallel execution of application programs on a multiprocessor system may lead to performance degradation if the workload of a parallel region is not large enough to amortize the overheads associated with the parallel...
详细信息
Parallel execution of application programs on a multiprocessor system may lead to performance degradation if the workload of a parallel region is not large enough to amortize the overheads associated with the parallel execution. Furthermore, if too many processes are running on the system in a multiprogrammed environment, the performance of the parallel application may degrade due to resource contention. This work proposes a comprehensive dynamic processor allocation scheme that takes both program behavior and system load into consideration when dynamically allocating processors. This mechanism was implemented on the Solaris operating system to dynamically control the execution of parallel C and Java application programs. Performance results show the effectiveness of this scheme in dynamically adapting to the current execution environment and program behavior, and that it outperforms a conventional time-shared system. Copyright (C) 2002 John Wiley Sons, Ltd.
This paper describes an advance in multiprocess cache system design called the process cache in which the secondary cache has segments dedicated to each process. This approach is in contrast to a monolithic secondary ...
详细信息
This paper describes an advance in multiprocess cache system design called the process cache in which the secondary cache has segments dedicated to each process. This approach is in contrast to a monolithic secondary cache in which every process' data can be distributed throughout the secondary cache. The process cache system is equally applicable to instruction and data caches, however, its performance is evaluated only for an instruction cache in this paper. The goal of the process cache architecture is to improve program performance in environments which have high context switch rates and long tertiary memory access times. This paper presents an instruction only trace-driven simulation that demonstrates that for long tertiary memory latencies (10 mu s) and short context switch intervals (1 ms) the process cache architecture significantly (11-14%) outperforms monolithic secondary cache architectures independent of cache memory size, associativity, or cache line size. This improvement does not require significantly more hardware than current cache memory designs. Additionally, we present an architecture which allows the system to allocate different amounts of secondary cache memory per process. Furthermore, process caches provide a high degree of locality for process information in a multiple processor configuration allowing efficient task migration between processors. (C) 1998 Elsevier Science Ltd. All rights reserved.
This paper presents a comparison of Pascal and Modula-2 based on the implementation of the basic components of a multi-tasking kernel. The major issues involved in high-level language implementation of a stand-alone m...
详细信息
This paper presents a comparison of Pascal and Modula-2 based on the implementation of the basic components of a multi-tasking kernel. The major issues involved in high-level language implementation of a stand-alone multi-tasking kernel on a microprocessor system are the transportation of the language support system and what may be termed software engineering considerations. The merits of Pascal and Modula-2 with respect to these issues are compared. Standard Pascal is a sequential language, and the development of the multi-tasking features of the kernal has to take place outside the scope of the language. The Modula-2 language (nucleus), however, allows the kernal to be built entirely using high-level constructs. Issues of language run-time support and portability are also covered. These topics and the high-level handling of interrupts have received little attention in the literature on Modula-2. The Modula-2 kernel also provides a possible implementation of the ‘MODULE processes’.
The performance of transaction processing systems is determined by the contention for hardware as well as software resources (database locks), due to the concurrency control mechanism of the database being accessed by...
详细信息
The performance of transaction processing systems is determined by the contention for hardware as well as software resources (database locks), due to the concurrency control mechanism of the database being accessed by transactions. The author considers a transaction processing system with a set of dominant transaction classes. Each class needs to acquire a certain subset of the locks in the database before it can be processed, i.e., predeclared lock requests with static locking. Straightforward application of the decomposition method requires the numerical solution of a two-dimensional Markov chain. Equivalently, a hierarchical simulation method, where the computer system is represented by a composite queue with exponential service rates, can be used to analyze the system.
暂无评论