An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. the standard reference encoder is improved by utilizing several algorithmic optimizations (...
详细信息
An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. the standard reference encoder is improved by utilizing several algorithmic optimizations (fast psycho-acoustic model, new tonality estimation, new time domain block switching, optimized quantizer and Huffman coder) and very careful code optimizations for PC CPU architectures with SIMD (single-instruction-multiple-data) instruction set. the psychoacoustic model used the MDCT filterbank for energy estimation and peak detection as a measure of tonality. Block size decision is based on local perceptual entropies as well as LPC analysis of the time signal. Algorithmic optimizations in the quantizer include loop control module modification and optimized Huffman search. Code optimization is based on parallelprocessing by replacing vector algebra and math junctions withtheir optimized equivalents with Intel/sup /spl reg// Signal processing Library (SPL). the implemented codec outperforms consumer MP3 encoders at 30% less bitrate at the same time achieving encoding times several times faster than real-time.
In most urban roads, and similar environments such as in the theme parks, campus sites, industrial estates, science parks and the like, the painted lane markings that exist may not be easily discernible by CCD cameras...
详细信息
In most urban roads, and similar environments such as in the theme parks, campus sites, industrial estates, science parks and the like, the painted lane markings that exist may not be easily discernible by CCD cameras due to poor lighting, bad weather conditions and inadequate maintenance. An important feature of roads in such environments is the existence of pavements or curbs on either side defining the road boundaries. these curbs, which are mostly parallel to the road, can be harnessed to extract useful features of the road for implementing autonomous navigation or driver assistance systems. However, extraction of the curb or road edge feature using vision image data is a difficult task as curbs are not conspicuous in the vision image. To extract the curb from a camera image requires extensive image processing, heuristics and very favorable lighting. In our approach, road curbs are extracted speedily using range data provided by a 2D laser measurement system (LMS). Experimental results are presented to demonstrate the viability, and effectiveness, of the proposed methodology and its robustness to different obstacle, weather and lighting conditions.
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of...
详细信息
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of cells placed on a regular grid. these cells interact locally an th e array can have both local and global dynamics. the software of this architecture is an analogic algorithm that builds on analog and logical spatio-temporal instructions of the underlying hardware, that is a locally connected cellular nonlinear network (CNN). Within this framework two classes of PDEs, motivated also by image processing methodologies will be discussed: (i) reaction-diffusion (local) types and (ii) contrast modification (global) types. It will be shown that based on cellular diffusion and wave-computing formulations these classes can be approximated on existing CNN Universal Machine (CNN-UM) chips. thus, the last generation of stored program topographic array microprocessors with integrated sensing and computing could also be viewed as the first prototypes of analogic cellular PDE machines implemented on silicon.
Various calculations of matrices and vectors are used in many digital signal processing systems. Although the calculation simply repeats multiplication and addition, the reiteration processing is usually heavy. theref...
详细信息
ISBN:
(纸本)0769514413
Various calculations of matrices and vectors are used in many digital signal processing systems. Although the calculation simply repeats multiplication and addition, the reiteration processing is usually heavy. therefore, in order to perform the calculations with high speed, it is necessary to apply parallel precessing. Although there is an issue with increased circuit area in the case of digital LSI, the proposed analog circuit can realize multiplication and addition simultaneously with a simple structure which arranges capacitors in a matrix form. Furthermore, a vowel speech recognition system is designed using this circuit.
In this paper, we present a multi-objective hardware-software co-synthesis system for multi-rate, real-time, low power distributed embedded systems consisting of dynamically reconfigurable FPGAs, processors, and other...
详细信息
ISBN:
(纸本)0769514413
In this paper, we present a multi-objective hardware-software co-synthesis system for multi-rate, real-time, low power distributed embedded systems consisting of dynamically reconfigurable FPGAs, processors, and other system resources. We use an evolutionary algorithm based framework for automatically determining the quantity and type of different system resources, and then assigning tasks to different processing elements (PEs) and task communications to communication links. For FPGAs, we propose a two-dimensional, multi-rate cyclic scheduling algorithm, which determines task priorities based on real-time constraints and reconfiguration overhead information, and then schedules tasks based on the resource utilization and reconfiguration condition in both space and time. the FPGA scheduler is integrated in a list-based system scheduler. To the best of our knowledge, this is the first multi-objective co-synthesis system, which uses dynamically reconfigurable devices to synthesize a distributed embedded system, to target simultaneous optimization of system price and power. Experimental results indicate that our method can reduce schedule length by an average of 41.0% and reconfiguration power by an average of 46.0% compared to the previous method. It also yields multiple system architectures which trade off system price and power under real-time constraints.
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computa...
ISBN:
(纸本)3540424954
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computational requirements with computing resources;a tool for binding to threads processors;a distributed object infrastructure for interaction and steering;optimal polling for latency-throughput tradeoffs in queue-based network interfaces for clusters;performance prediction of data-dependent task parallel programs;the hardware performance monitor toolkit;via communication performance on a gigabit Ethernet cluster;group-based performance analysis for multithreaded SMP cluster applications;exploiting unused time slots in list scheduling considering communication contention;an evaluation of partitioners for parallel SAMR applications;load balancing on networks with dynamically changing topology;approximation algorithms for scheduling independent malleable tasks;load redundancy elimination on executable code;using a swap instruction to coalesce loads and stores;data-parallel compiler support for multipartitioning;parallel and distributed databases, data mining and knowledge discovery;an experimental performance evaluation of join algorithms for parallel object databases;a classification of skew effects in parallel database systems;experiments in parallel clustering with DBSCAN;analysis of the cycle structure of permutations;scanning biosequence databases on a hybrid parallel architecture;experiences in using MPI-Io on top of GPFS for the IFS weather forecast code;improving conditional branch prediction on speculative multithreading architectures;performances of a dynamic threads scheduler and self-stabilizing neighborhood unique naming under unfair scheduler.
the present paper deals withthe parallelization of an explicit time stepping algorithm in a general finite element environment. Particular attention has been paid to nonlocal constitutive models. A central difference...
详细信息
the present paper deals withthe parallelization of an explicit time stepping algorithm in a general finite element environment. Particular attention has been paid to nonlocal constitutive models. A central difference method has been used to discretize the governing equations in time. Modifications of both node-cut and element-cut strategies have been developed to provide an efficient support for nonlocal constitutive models. Efficiency of the proposed approach is demonstrated on different hardware platforms. (C) 2001 Civil-Comp Ltd. and Elsevier Science Ltd. All rights reserved.
We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that...
详细信息
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, includ...
详细信息
ISBN:
(纸本)0769512607
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, including a network interface card-based processor and memory and efficient user-level communication libraries. We developed a micro-benchmark to test the scheduler's performance under various aspects of parallel job workloads: memory usage, bandwidth and latency-bound communication, number of processes, timeslice quantum, and multiprogramming levels. Our experiments show that the gang scheduler performs relatively well under most workload conditions, is largely insensitive to the number of concurrent jobs in the system and scales almost linearly with number of nodes. On the other hand, the scheduler is very sensitive to the timeslice quantum, and values under 30 seconds can incur large overheads and fairness problems.
the index-permutation graph (IPG) model is a natural extension of the Cayley graph model, and super-IPGs form an efficient class of IPGs that contain a wide variety of networks as subclasses. In this paper, we derive ...
详细信息
ISBN:
(纸本)0769512577;0769512585
the index-permutation graph (IPG) model is a natural extension of the Cayley graph model, and super-IPGs form an efficient class of IPGs that contain a wide variety of networks as subclasses. In this paper, we derive a number of efficient algorithms and embeddings for super-IPGs, proving their versatility. We show that a multitude of important networks can also be emulated in super-IPGs with optimal slowdown. Also, the intercluster diameter, average intercluster distance, and bisection bandwidth of suitably constructed super-IPGs are optimal within small constant factors. Finally, we show that when parallel computers, built as multiple chip-multiprocessors (MCMP), are based on super-IPGs, they can significantly outperform those based on hypercubes, k-ary n-cubes, and other networks in carrying out communication-intensive tasks.
暂无评论