The memory intensive nature of object-oriented languages such as C + + and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity...
详细信息
The memory intensive nature of object-oriented languages such as C + + and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As today's VLSI technology advances, it becomes more and more attractive to map basic software algorithms such as malloc(),free(), and realloc() into hardware. This paper presents a hardware design of realloc function that fully utilizes the advantage of combinational logic. There are two steps needed to complete a reallocation process: (a) try to reallocate on the original memory block and (b) if (a) failed, allocate another memory block and copy the contents of the original block to this new location. In our scheme, (a) can be done in constant time. For (b), the allocation of new memory block and the deallocation of original block are done in constant time. The hardware complexity of proposed scheme (i.e. X-unit, RS-unit, and ESG-unit) is O(n), where n represents the size of bit-map. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
作者:
Dong, QianLi, BingSoutheast Univ
Sch Integrated Circuits Sipailou 2 Nanjing 210096 Jiangsu Peoples R China Southeast Univ
Chengxian Coll Dongda Rd 6 Nanjing 210088 Jiangsu Peoples R China
The hardware-based dictionary compression is widely adopted for high speed requirement of real-time data processing. Hash function helps to manage large dictionary to improve compression ratio but is prone to collisio...
详细信息
The hardware-based dictionary compression is widely adopted for high speed requirement of real-time data processing. Hash function helps to manage large dictionary to improve compression ratio but is prone to collisions, so some phrases in match search result are not true matches. This paper presents a novel match search approach called dual chaining hash refining, which can improve the efficiency of match search. From the experimental results, our method showed obvious advantage in compression speed compared with other approach that utilizes single hash function described in the previous publications.
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity i...
详细信息
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As today's VLSI technology advances, it becomes more and more attractive to map software algorithms such as malloc(), free(), realloc(), and garbage collection into hardware. This paper presents hardware designs of realloc function and sweeping function (for mark and sweep garbage collection) that fully utilize the advantages of combinational logic. In our scheme, the reallocation on the original block can be done in constant time. If the reallocation on the original block is not possible, the previously proposed malloc() and free() can be used to move the contents to a new location. For the sweeping function, the bit-sweeper can detect and sweep the garbage in constant time. Since the sweeping phase often consumes more time than the marking phase, the garbage collection time can be substantially improved. The hardware complexity of proposed scheme (i.e. X-Unit, RS-unit, ESG-unit, and bit-sweeper) is O(n), where n represents the size of bit-map. (C) 2002 Elsevier Science B.V. All rights reserved.
An algorithm for net extraction is developed that is suitable for hardware implementation. The algorithm has three components: sort, pair generation, and net construction. The first two are systolic and may be impleme...
详细信息
An algorithm for net extraction is developed that is suitable for hardware implementation. The algorithm has three components: sort, pair generation, and net construction. The first two are systolic and may be implemented as a chain of systolic processors. The third can be implemented using a broadcast bus.
A parallel phrase matching (PM) engine for dictionary compression is presented. hardware based parallel chaining hash can eliminate erroneous PM results raised by hash collision;while newly-designed storage architectu...
详细信息
A parallel phrase matching (PM) engine for dictionary compression is presented. hardware based parallel chaining hash can eliminate erroneous PM results raised by hash collision;while newly-designed storage architecture holding PM results solved the data dependency issue;Thus, the average compression speed is increased by 53%.
There is an increasing demand for finding a mobile location in areas such as robotics applications, electronic warfare positioning, wireless communication systems and vehicle security. Angle of Arrival (AOA) is one of...
详细信息
ISBN:
(纸本)9781424412358
There is an increasing demand for finding a mobile location in areas such as robotics applications, electronic warfare positioning, wireless communication systems and vehicle security. Angle of Arrival (AOA) is one of the key techniques in wireless location estimation. Location estimation is done by using standard triangulation operations that are usually implemented in software. In this paper, we present a new hardware-oriented algorithm that uses only simple shift and add operations in the computation and therefore can be easily implemented in hardware. The comparison between this method and the conventional AOA based positioning technique is discussed in terms of computational cost (required number of operations). The results show that the proposed algorithm outperforms the traditional one in terms of both software and hardware implementations points of views.
As mobile and handheld devices are gaining popularity, many applications have found their ways into these devices. In order to use these mobile systems effectively and efficiently, it is important to optimize their em...
详细信息
ISBN:
(纸本)9781479915019
As mobile and handheld devices are gaining popularity, many applications have found their ways into these devices. In order to use these mobile systems effectively and efficiently, it is important to optimize their embedded software and hardware. This work focuses on hardware support for mobile and embedded applications in single-chip format so as to reduce the hardware footprint as required in these handheld devices. Applications specific circuits, single-core microprocessor, and field programmable gate arrays are presented, and their strengths and weaknesses are discussed. Reconfigurable computing systems, specifically FPGA-based approaches, are introduced, and techniques for their reconfiguration are analyzed. It is concluded that reconfigurable FPGA-based system is currently the best option to deliver embedded applications that have stringent requirements.
This paper presents a FIFO-based parallel merge sorter optimized for the latest FPGA. More specifically, we show a sorter that sorts K keys in latency K+ log(2) K-1 using log(2) K comparators. It uses K M + log(2) K+ ...
详细信息
ISBN:
(纸本)9781467371483
This paper presents a FIFO-based parallel merge sorter optimized for the latest FPGA. More specifically, we show a sorter that sorts K keys in latency K+ log(2) K-1 using log(2) K comparators. It uses K M + log(2) K+ log(2) M-1 memory blocks with capacity M to implement FIFOs. It receives K keys one by one in every clock cycle and outputs the sorted sequence of them from K + log(2) K -1 clock cycles after. Since K clock cycles are necessary to input all K keys, our sorter is almost optimal in terms of the latency. Also, since the total FIFO capacity is only K+ M log(2) K+ M log(2) M-M and at least K keys must be stored in the sorter, our sorter is also almost optimal in terms of the total FIFO capacity if M is small. This paper also presents topK-sorter, which outputs top K keys in N input keys for any large N. Our topK-sorter runs in latency N + log(2) K using log(2) K + 1 comparators. It uses memory blocks of size M and the total FIFO capacity is only 2K + M log(2) K+ M log(2) M-2M. Quite surprisingly, the total FIFO capacity is independent of N. Also, since the latency must be at least N, that of our topK-sorter is almost optimal in terms of the latency. Finally, we have implemented our K-sorter and topK-sorter in a Xilinx Virtex-7 FPGA using built-in Distributed RAMs and Block RAMs. The implementation results show that our K-sorter reduces the used memory resources by half, and both K-sorter and topK-sorter are practical and efficient.
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity i...
详细信息
ISBN:
(纸本)0769501524
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As today's VLSI technology advances, it becomes more and more attractive to map basic software algorithms such as malloc(), free(), and realloc() into hardware. This paper presents a hardware design of realloc function that fully utilizes the advantage of combinational logic. There are two steps needed to complete a reallocation process: (a) try to reallocate on the original memory block and (b) if (a)failed, allocate another memory block and copy the contents of the original block to this new location. In our scheme, (a) can be done in constant time. For (b), the allocation of new memory block and the deallocation of original block are done in constant time. The hardware complexity of proposed scheme (i.e. X-unit, RS-unit, and ESG-unit) is O(n), where n represents the size of bit-map.
Suppose that a sequence of sensing data with timestamps are transferred asynchronously. Some of sensing data may be delayed by some period of time and the sequence is not in proper increasing order of timestamps. A se...
详细信息
ISBN:
(纸本)9781509026555
Suppose that a sequence of sensing data with timestamps are transferred asynchronously. Some of sensing data may be delayed by some period of time and the sequence is not in proper increasing order of timestamps. A sequence of timestamps t(0), t(1), . . . , tn-1 is d-sorted if t(i) < t(j) holds for all pairs of timestamps ti and tj such that j-i >= d. Intuitively, a parameter d corresponds to the maximum period of delay and we can say that a d-sorted sequence is almost sorted if d is small. The main contribution of this paper is to show a hardware d-sorter that sorts a d-sorted sequence of timestamps, and to implement it in the FPGA. Our idea for a hardware d-sorter is to use a merge-based algorithm using FIFOs. This hardware algorithm takes n + 2d + log(2) d clock cycles using log2 d comparators for a d-sorted sequence of n timestamps. Since Omega(n log d) comparisons are necessary, it is cost optimal and attains optimal speedup. We also present the cyclic comparison method to reduce the number of bits in timestamps to be sorted. The experimental results show that the FPGA implementation is 5-10 time faster than a sequential d-sort program using a single CPU. Also, when d = 2048, our merge-based d-sorter is 2.59 times faster than a previously published hardware heap-based d-sorter, which takes 2n + 2d clock cycles.
暂无评论