Routers must perform packet classification at high speeds to efficiently implement functions such as firewalls and diffserv. Classification can be based on an arbitrary number of fields in the packet header. Performin...
详细信息
Routers must perform packet classification at high speeds to efficiently implement functions such as firewalls and diffserv. Classification can be based on an arbitrary number of fields in the packet header. Performing classification quickly on an arbitrary number of fields is known to be difficult, and has poor worst-case complexity. In this paper, we re-examine two basic mechanisms that have been dismissed in the literature as being too inefficient: backtracking search and set pruning tries. We find using real databases that the time for backtracking search is much better than the worst-case bound;instead of Ω((logN)k-1), the search time is only roughly twice the optimal search time1. Similarly, we find that set pruning tries (using a DAG optimization) have much better storage costs than the worst-case bound. We also propose several new techniques to further improve the two basic mechanisms. Our major ideas are (i) backtracking search on a small memory budget, (ii) a novel compression algorithm, (iii) pipelining the search, (iv) the ability to trade-off smoothly between backtracking and set pruning, and (v) algorithms to effectively make use of hardware if hardware is available. We quantify the performance gain of each technique using real databases. We show that on real firewall databases our schemes, with the accompanying optimizations, are close to optimal in time and storage.
This paper presents a high-level design methodology, called input space adaptive design, and new design automation algorithms for optimizing energy consumption and performance. An input space adaptive design exploits ...
详细信息
ISBN:
(纸本)1581132972
This paper presents a high-level design methodology, called input space adaptive design, and new design automation algorithms for optimizing energy consumption and performance. An input space adaptive design exploits the well-known fact that the quality of hardware circuits and software programs can be significantly optimized by employing algorithms and implementation architectures that adapt to the input statistics. We propose a methodology for such designs which includes identifying parts of the behavior to be optimized, selecting appropriate input sub-spaces, transforming the behavior, and verifying the equivalence of the original and optimized designs. Experimental results indicate that such designs can reduce energy consumption by up to 70.6% (average of 55.4%), and simultaneously improve performance by up to 85.1% (average of 58.1%), leading to a reduction in the energy-delay product by up to 95.6% (average of 80.7%), compared to well-optimized designs that do not employ such techniques.
In this paper, we describe a software pipelining framework, CALiBeR (cluster aware load balancing retiming algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be effectively us...
详细信息
ISBN:
(纸本)9780780372498
In this paper, we describe a software pipelining framework, CALiBeR (cluster aware load balancing retiming algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be effectively used by embedded system designers to explore different code optimization alternatives, i.e. it can assist the generation of high-quality customized retiming solutions for desired program memory size and throughput requirements, while minimizing register pressure. An extensive set of experimental results is presented, considering several representative benchmark loop kernels and a wide variety of clustered datapath configurations, demonstrating that our algorithm compares favorably with one of the best state-of-the-art algorithms, achieving up to 50% improvement in performance and up to 47% improvement in register requirements.
Providing quality-of-service (QoS) guarantees in packet networks gives rise to several challenging issues. One of them is how to determine a feasible path that satisfies a set of constraints while maintaining high uti...
详细信息
ISBN:
(纸本)0780370163
Providing quality-of-service (QoS) guarantees in packet networks gives rise to several challenging issues. One of them is how to determine a feasible path that satisfies a set of constraints while maintaining high utilization of network resources. The latter objective implies the need to impose an additional optimality requirement on the feasibility problem. This can be done through a primary cost function (e.g., administrative weight, hop count) according to which the selected feasible path is optimal. In general, multi-constrained path selection, with or without optimization, is an NP-complete problem that cannot be exactly solved in polynomial-time. Heuristics and approximation algorithms with polynomial and pseudo-polynomial-time complexities are often used to deal with this problem. However, existing solutions suffer either from excessive computational complexities that cannot be used for online network operation or from low performance. Moreover, they only deal with special cases of the problem (e.g., two constraints without optimization, one constraint with optimization, etc.). For the feasibility problem under multiple constraints, some researchers have proposed a nonlinear cost function whose minimization provides a continuous spectrum of solutions ranging from a generalized linear approximation (GLA) to an asymptotically exact solution. We propose an efficient heuristic algorithm for the most general form of the problem. We first formalize the theoretical properties of the above nonlinear cost function. We then introduce our heuristic algorithm (H MCOP), which attempts to minimize both the nonlinear cost function (for the feasibility part) and the primary cost function (for the optimality part). We prove that H MCOP guarantees at least the performance of GLA and often improves upon it. H MCOP has the same order of complexity as Dijkstra's algorithm. Using extensive simulations on random graphs with correlated and uncorrelated link weights, we show that und
The Walsh-Hadamard Transform (WHT) is an important algorithm in signal processing because of its simplicity. However, in computing large size WHT, non-unit stride access results in poor cache performance leading to se...
详细信息
The Walsh-Hadamard Transform (WHT) is an important algorithm in signal processing because of its simplicity. However, in computing large size WHT, non-unit stride access results in poor cache performance leading to severe degradation in performance. This poor cache performance is also a critical problem in achieving highperformance in other large size signal transforms. We develop a cache friendly technique that improves the performance of large size WHT. In our approach, data reorganization is performed between computation stages to reduce cache pollution. Furthermore, we develop an efficient search algorithm to determine the optimal factorization tree based upon problem size and stride access in the decomposition. Experimental results show that our approach achieves up to 180% performance improvement over the state of the art package on Alpha 21264 and MIPS R10000. In addition, the proposed optimization is applicable to other signal transforms and is portable across various platforms.
This paper integrates our contributions in the domain of blind source separation and blind source deconvolution, both in static and dynamic environments. We focus on the use of the state space formulation and the deve...
详细信息
ISBN:
(纸本)0780370449
This paper integrates our contributions in the domain of blind source separation and blind source deconvolution, both in static and dynamic environments. We focus on the use of the state space formulation and the development of a generalized optimization framework, using Kullback-Liebler divergence as the performance measure subject to the constraints of a state space representation. Various special cases are subsequently derived from this general case and are compared with material in recent literature. Some of these reported works have also been implemented in dedicated hardware/software and experimental designs have been compared with their computer simulations.
We present the design and implementation of a real-time computer vision system for a rotorcraft unmanned aerial vehicle to land onto a known landing target. This vision system consists of customized software and off-t...
详细信息
ISBN:
(纸本)0780365763
We present the design and implementation of a real-time computer vision system for a rotorcraft unmanned aerial vehicle to land onto a known landing target. This vision system consists of customized software and off-the-shelf hardware which perform image processing, segmentation, feature point extraction, camera pan/tilt control, and motion estimation. We introduce the design of a landing target which significantly simplifies the computer vision tasks such as corner detection and correspondence matching. Customized algorithms are developed to allow for realtime computation at a frame rate of 30 Hz. Such algorithms include certain linear and nonlinearoptimization schemes for model-based camera pose estimation. We present results from an actual flight test which show the vision-based state estimates are accurate to within 5 cm in each axis of translation, and 5 degrees in each axis of rotation, making vision a viable sensor to be placed in the control loop of a hierarchical flight management system.
Describes the computer middleware called the Hyper Artificial Life (HAL) optimization system, which is based on artificial life theories and which is effective for almost all kinds of combinatorial optimization proble...
详细信息
ISBN:
(纸本)0769511538
Describes the computer middleware called the Hyper Artificial Life (HAL) optimization system, which is based on artificial life theories and which is effective for almost all kinds of combinatorial optimization problems in our actual world. This middleware aids the efficient development of parallel processing of an application program for combinatorial optimization problems by adopting a conventional evolution procedure. The application based on this middleware has high autonomy and high robustness, and improves its performance on a parallel computer. In this case, a supply-chain management (SCM) scheduling program, which is actually used by many users, has been applied to this middleware in parallel in order to verify and evaluate HAL. In its evaluation, we found we could obtain a remarkable improvement in the performance. This model has the characteristics of reproduction, mutation and genetics, and we found a rare phenomenon, considered as emergence in the actual result. This obviously transcends the concept of many conventional algorithms and their ability for optimization. Moreover, the model has a hyper-structure, which is why we named it the Hyper Artificial Life system.
Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. Real-time signal processing consumes the majority of the world's computi...
详细信息
Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable parallel microprocessors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in mapping (i.e., placement and routing) of an algorithm onto a parallel computer in a general manner that preserves software portability. We have developed the Parallel Vector Library (PVL) to allow signal processing algorithms to be written using high level Matlab like constructs that are independent of the underlying parallel mapping. Programs written using PVL can be ported to a wide range of parallel computers without sacrificing performance. Furthemore, the mapping concepts in PVL provide the infrastructure for enabling new capabilities such as fault tolerance, dynamic scheduling and self-optimization. This presentation discusses PVL with particular emphasis on quantitative comparisons with standard parallel signal programming practices.
The DAKOTA (Design Analysis Kit for optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for...
详细信息
The DAKOTA (Design Analysis Kit for optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, analytic reliability, and stochastic finite element methods; parameter estimation with nonlinear least squares methods; and sensitivity analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on highperformance computers. This report serves as a reference manual for the commands specification for the DAKOTA software, providing input overviews, option descriptions, and example specifications.
暂无评论