As opposed to Prime Factor type algorithms, the only requirement made by parallel cyclic convolution techniques based on block Pseudocirculant matrices is that the convolution length be composite. Highly composite len...
详细信息
As opposed to Prime Factor type algorithms, the only requirement made by parallel cyclic convolution techniques based on block Pseudocirculant matrices is that the convolution length be composite. Highly composite lengths, in particular, give a larger variety of implementation choices. In this paper we offer an introduction to new mathematical constructs, the Super Block Pseudocirculant Matrix, and the Block Pseudocyclic Shift Operator, as a base to derive further structures for this important class of parallel, one dimensional, cyclic convolution algorithms based on block pseudocirculant matrices. their modular composition makes them suitable for implementation in VLSI, fpga or multiprocessor computers in either a pipelined or a parallel fashion. Block pseudocirculants appear in fields such as precoding systems, transmultiplexers, polyphase networks, block filtering, QMF banks, and others, therefore the new mathematical constructs introduced in this paper may have an impact that transcend its sole applications to parallel cyclic convolution and its related applications.
It is becoming increasingly difficult to implement effective systems for preventing network attacks, due to the combination of (1) the rising sophistication of attacks requiring more complex analysis to detect, (2) th...
详细信息
It is becoming increasingly difficult to implement effective systems for preventing network attacks, due to the combination of (1) the rising sophistication of attacks requiring more complex analysis to detect, (2) the relentless growth in the volume of network traffic that we must analyze, and, critically, (3) the failure in recent years for uniprocessor performance to sustain the exponential gains that for so many years CPUs enjoyed ("Moore's Law"). For commodity hardware, tomorrow's performance gains will instead come from multicore architectures in which a whole set of CPUs executes concurrently. Taking advantage of the full power of multi-core processors for network intrusion prevention requires an in-depth approach. In this work we frame an architecture customized for parallel execution of network attack analysis. At the lowest layer of the architecture is an "Active Network Interface" (ANI), a custom device based on an inexpensive fpga platform. the ANI provides the inline interface to the network, reading in packets and forwarding them after they are approved. It also serves as the front-end for dispatching copies of the packets to a set of analysis threads. the analysis itself is structured as an event-based system, which allows us to find many opportunities for concurrent execution, since events introduce a natural, decoupled asynchrony into the flow of analysis while still maintaining good cache locality. Finally, by associating events withthe packets that ultimately stimulated them, we can determine when all analysis for a given packet has completed, and thus that it is safe to forward the pending packet - providing none of the analysis elements previously signaled that the packet should instead be discarded.
the aim of this paper is to propose a real time reconfigurable (RTR) micro-fpga using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with cla...
详细信息
ISBN:
(纸本)1595932925
the aim of this paper is to propose a real time reconfigurable (RTR) micro-fpga using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with classical CMOS processes. Moreover remanent property of such a memory could limit configuration time and power consumption required at each power up of the die. Nevertheless, each configuration memory point has to be readable independently from each other, that is why the approach is different from the classical memory array one. Copyright 2006acm.
Due to their generic and highly programmable nature, fpgas provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of fpgas in scientific applicat...
详细信息
ISBN:
(纸本)1595932925
Due to their generic and highly programmable nature, fpgas provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of fpgas in scientific applications that require floating-point arithmetic. Even simple floating-point operations consume a large amount of computational resources. In this paper, we introduce embedding floating-point multiply-add units in an island style fpga. this has shown to have an average area savings of 55.0% and an average increase of 40.7% in clock rate over existing architectures. Copyright 2006acm.
Division is one of the most complicated and expensive arithmetic operations. Both clock frequency and operation delay are limited by the memory wall, even in LUT-based fpga devices. To conquer the memory limitation, w...
详细信息
ISBN:
(纸本)1595932925
Division is one of the most complicated and expensive arithmetic operations. Both clock frequency and operation delay are limited by the memory wall, even in LUT-based fpga devices. To conquer the memory limitation, we propose a hybrid division algorithm which employs Prescaling, Series expansion and Taylor expansion (PST) algorithms. the proposed algorithm boosts very-high radix division efficiently. the algorithm is multiplicative, and feasible for the modern fpga devices with build-in multipliers. the algorithm is implemented in Altera StratixII fpga devices and compared withthe division IP core generated by Mega Wizard. the result shows that the PST algorithm has higher clock frequency, lower execution time and also lower power consumption. Copyright 2006acm.
the paper presents several improvements to state-of-the-art in fpga technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-...
详细信息
ISBN:
(纸本)1595932925
the paper presents several improvements to state-of-the-art in fpga technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-feasible cuts without pruning for up to 7 inputs for the largest MCNC benchmarks, A new technique for on-the-fly cut dropping reduces by orders of magnitude memory needed to represent cuts for large designs. Improved area recovery leads to mappings with area on average 7% smaller than DAOmap, while preserving delay optimality when starting from the same optimized netlists. Applying mapping with structural choices derived by a synthesis flow on average reduces delay by 7% and area by 14%, compared to DAOmap. Copyright 2006acm.
this paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (fpga's). the paper begins by describing a parameterized clock n...
详细信息
ISBN:
(纸本)1595932925
this paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (fpga's). the paper begins by describing a parameterized clock network model that describes a broad range of programmable clock network architectures. Specifically, the model supports architectures with multiple local and global clock domains and varying amounts of flexibility at various levels of the clock network. Using the model, the architectural parameters that control the flexibility of the clock network are varied to determine the cost of this flexibility in terms of area and power dissipation. From these experiments, the study finds that area and power costs are highest for networks with flexibility close to the logic blocks. Furthermore, it found that clock networks with local clock domains have little overhead and are significantly more efficient than clock networks without local clock domains for applications with multiple clocks. Copyright 2006acm.
the high unit cost of fpga devices often deters their use beyond the prototyping stage. Efforts have been made to reduce the part-cost of fpga devices, resulting in the development of Design-Specific fpgas. these part...
详细信息
ISBN:
(纸本)1595932925
the high unit cost of fpga devices often deters their use beyond the prototyping stage. Efforts have been made to reduce the part-cost of fpga devices, resulting in the development of Design-Specific fpgas. these parts offer cost reductions by limiting manufacturing tests and improving the number of working devices in a wafer. this paper addresses the issue of yield enhancement in Design-Specific fpgas. In this paper, an analytical model predicting the probability of mapping a specific design onto potentially defective fpgas is developed. When combined with existing yield modelling techniques, a quantitative measure of the potential yield improvements of the Design-Specific fpga approach is reported for current and future technology nodes. It is found that this approach, while beneficial with current manufacturing technology, may not be suitable for 22nm technology or beyond. Copyright 2006acm.
programmable logic devices such as fpgas are useful for a wide range of applications. However, fpgas are not commonly used in battery-powered applications because they consume more power than ASICs and lack power mana...
详细信息
ISBN:
(纸本)1595932925
programmable logic devices such as fpgas are useful for a wide range of applications. However, fpgas are not commonly used in battery-powered applications because they consume more power than ASICs and lack power management features. In this paper, we describe the design and implementation of Pika, a low-power fpga core targeting battery-powered applications such as those in consumer and automotive markets. Our design uses the Xilinx Spartan-3 low-cost fpga as a baseline and achieves substantial power savings through a series of power optimizations. the resulting architecture is compatible with existing commercial design tools. the implementation is done in a 90nm triple-oxide CMOS process. Compared to the baseline design, Pika consumes 46% less active power and 99% less standby power. Furthermore, it retains circuit and configuration state during standby mode, and wakes up from standby mode in approximately 100ns. Copyright 2006acm.
暂无评论