In this paper, we advocate that quality of service (QoS) routing protocols should use routes that are weaved and/or geographically pinned. this leads to a new paradigm for QoS routing called geographically pinned QoS ...
详细信息
ISBN:
(纸本)0780375335
In this paper, we advocate that quality of service (QoS) routing protocols should use routes that are weaved and/or geographically pinned. this leads to a new paradigm for QoS routing called geographically pinned QoS routing (GPQR). Based on GPQR, we propose the embedded QoS routing (EQR) scheme for scalable routing with support for Differentiated Service and QoS guarantees in mobile ad hoc networks (MANET) and ad hoc cellular networks (ACENET). Due to the QoS routing disciplines proposed for EQR, the philosophy for selecting and provisioning QoS routes in EQR is fundamentally changed. Moreover, EQR combines several important advantages of localized routing and table-driven routing, including scalability and capability for wireless traffic engineering.
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of...
详细信息
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of cells placed on a regular grid. these cells interact locally an th e array can have both local and global dynamics. the software of this architecture is an analogic algorithm that builds on analog and logical spatio-temporal instructions of the underlying hardware, that is a locally connected cellular nonlinear network (CNN). Within this framework two classes of PDEs, motivated also by image processing methodologies will be discussed: (i) reaction-diffusion (local) types and (ii) contrast modification (global) types. It will be shown that based on cellular diffusion and wave-computing formulations these classes can be approximated on existing CNN Universal Machine (CNN-UM) chips. thus, the last generation of stored program topographic array microprocessors with integrated sensing and computing could also be viewed as the first prototypes of analogic cellular PDE machines implemented on silicon.
We propose the hardware-oriented genetic algorithm processor with efficient exploration based on a subpopulation architecture for high-performance convergence and reducing computation time. We applied the steady-state...
详细信息
ISBN:
(纸本)9810475241
We propose the hardware-oriented genetic algorithm processor with efficient exploration based on a subpopulation architecture for high-performance convergence and reducing computation time. We applied the steady-state model among continuous generation model, modified tournament selection, special survival condition and the parallelism of coarse-grain to our proposed GAP. In addition, the crossover operator selection method with respect to the convergence state of each subpopulation was newly employed. In order to implement the efficient hardware structure, the pipelined structure was used. the proposed GAP is implemented on the AGENT2000 board with EFP10K200SRC device.
Association Mining, a class of data mining techniques, is one of the most researched field in data mining, where algorithms are designed to discover rules that reflect dependencies among values of an attribute. Becaus...
详细信息
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, includ...
详细信息
ISBN:
(纸本)0769512607
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, including a network interface card-based processor and memory and efficient user-level communication libraries. We developed a micro-benchmark to test the scheduler's performance under various aspects of parallel job workloads: memory usage, bandwidth and latency-bound communication, number of processes, timeslice quantum, and multiprogramming levels. Our experiments show that the gang scheduler performs relatively well under most workload conditions, is largely insensitive to the number of concurrent jobs in the system and scales almost linearly with number of nodes. On the other hand, the scheduler is very sensitive to the timeslice quantum, and values under 30 seconds can incur large overheads and fairness problems.
Logic simulation of a complex processor model in VLSI design is very time consuming. One possibility to increase the simulation speed is to partition the processor model and assign the resulting parts to simulator ins...
详细信息
ISBN:
(纸本)0769512607
Logic simulation of a complex processor model in VLSI design is very time consuming. One possibility to increase the simulation speed is to partition the processor model and assign the resulting parts to simulator instances that cooperate over a loosely-coupled system. For corresponding model partitioning processes, we have developed a distributed framework parallelMAP implementing a hierarchical partitioning strategy. It is intended to be used as production environment in VLSI design as well as an experimental test bed for algorithm development. In this paper we describe the possibilities parallelMAP offers for the modular construction of partitioning processes starting from a set of basic sequential and parallel modules. Experimental experiences are given with respect to IBM processor models comprising from 1.5 * 10(5) to 2.5 * 10(6) elements at gate level.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
ISBN:
(纸本)0780370570
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
Retrograde analysis is an efficient exhaustive search method. It is a powerful tool that can be used in solving problems where end states have known values but starting states do not. It has been widely used to solve ...
详细信息
ISBN:
(纸本)0769512968
Retrograde analysis is an efficient exhaustive search method. It is a powerful tool that can be used in solving problems where end states have known values but starting states do not. It has been widely used to solve mathematically-precise games such as chess endgames, and is potentially usable in energy-minimization problems. With increasing computing power, both in speed and storage capacity, retrograde analysis will become more and more useful. this paper looks at successful applications to games, the challenges ahead, and the modifications that are required to utilize distributed hardware. the power and the usefulness of retrograde analysis are still limited by the computing resources one has access to. Today, the best sequential retrograde algorithms are capable of solving problems with about 109 states in a few hours on a standard personal computer Bigger problems need more powerful computers, or take much longer to solve, or are simply out of reach of today's technologies, Introducing parallelism to retrograde analysis is a natural way to attack the bigger problems. there are today three main architectures available for doing parallel retrograde analysis: namely Symmetric Multiprocessor systems, High-speed network based distributed systems, and Internet based distributed systems. In this paper, we discuss some of the key issues in doing parallel retrograde analysis on these different architectures. Technical challenges are addressed in detail, as well as some examples and proposals. these examples and proposals are drawn from various board games, but the ideas can be applied to other problem domains.
Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applicatio...
详细信息
ISBN:
(纸本)0769513638
Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scalable in shared-memory multiprocessors. the required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.
In this paper, we propose a new method for task decomposition based on output parallelism, in order to find the appropriate architectures for large-scale real-world problems automatically and efficiently. By using thi...
详细信息
In this paper, we propose a new method for task decomposition based on output parallelism, in order to find the appropriate architectures for large-scale real-world problems automatically and efficiently. By using this method, a problem can be divided flexibly into several sub-problems as chosen, each of which is composed of the whole input vector and a fraction of the output vector. Each module (for each sub=problem) is responsible for producing a fraction of the output vector of the original problem. this way, the hidden structure for the original problem's output units is decoupled. these modules can be grown and trained in sequence or in parallel. Incorporated withthe constructive learning algorithm, our method does not require excessive computation and any prior knowledge concerning decomposition. the feasibility of output parallelism is analyzed and proved. Several benchmarks are implemented to test the validity of this method. their results show that this method can reduce computation time, increase learning speed, and improve generalization accuracy for both classification and regression problems.
暂无评论