As computers are increasingly used in contexts where the amount of available memory is limited, it becomes important to devise techniques that reduce the memory footprint of application programs while leaving them in ...
详细信息
ISBN:
(纸本)9781581134636
As computers are increasingly used in contexts where the amount of available memory is limited, it becomes important to devise techniques that reduce the memory footprint of application programs while leaving them in an executable form. This paper describes an approach to applying data compression techniques to reduce the size of infrequently executed portions of a program. The compressed code is decompressed dynamically (via software) if needed, prior to execution. The use of data compression techniques increases the amount of code size reduction that can be achieved; their application to infrequently executed code limits the runtime overhead due to dynamic decompression; and the use of software decompression renders the approach generally applicable, without requiring specialized hardware. The code size reductions obtained depend on the threshold used to determine what code is "infrequently executed" and hence should be compressed: for low thresholds, we see size reductions of 13.7% to 18.8%, on average, for a set of embedded applications, without excessive runtime overhead.
Memory is one of the most restricted resources in embedded *** compression techniques address this issue by reducing the code size of *** coding is the most common used coding *** during the process of generating symb...
详细信息
Memory is one of the most restricted resources in embedded *** compression techniques address this issue by reducing the code size of *** coding is the most common used coding *** during the process of generating symbols from instruction,an experience-based partition way is usually used,which may cause information *** paper presents an OptimalPartition Based code compression (OPCC) *** tree model is used to extract correlation between bits in instruction.A clustering algorithm is proposed to cluster bits with higher correlation into *** results show that this method could improve the average compression ratio by 4.%.The decoder part is validated in Altera CycloneII FPGA.
This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc ...
详细信息
ISBN:
(纸本)9781581130942
This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large training set of program material in a conventional compiler intermediate representation (IR) and automatically infers a decision tree that separates IR code into streams that compress much better than the undifferentiated whole. Driving a conventional arithmetic compressor with this model yields code 30% smaller than the previous record for IR code compression, and 24% smaller than an ambitious optimizing compiler feeding an ambitious general-purpose data compressor.
This paper proposes a new method of code compression for embedded systems called by us as CC-MLD (Compressed code using Huffman-Based Multi-Level Dictionary). This method applies two compression techniques and it uses...
详细信息
ISBN:
(纸本)9781479911301
This paper proposes a new method of code compression for embedded systems called by us as CC-MLD (Compressed code using Huffman-Based Multi-Level Dictionary). This method applies two compression techniques and it uses the Huffman code compression algorithm. A single dictionary is divided into two levels and it is shared by both techniques. We performed simulations using applications from MiBench and we have used four embedded processors (ARM, MIPS, PowerPC and SPARC). Our method reduces code size up to 30.6% (including all extra costs for these four platforms). We have implemented the decompressor using VHDL and FPGA and we obtained only one clock from decompression process.
Modern microprocessors have used microcode as a way to implement legacy (rarely used) instructions, add new ISA features and enable patches to an existing design. As more features are added to processors (e.g. protect...
详细信息
Modern microprocessors have used microcode as a way to implement legacy (rarely used) instructions, add new ISA features and enable patches to an existing design. As more features are added to processors (e.g. protection and virtualization), area and power costs associated with the microcode memory increased significantly. A recent Intel internal design targeted at low power and small footprint has estimated the costs of the microcode ROM to approach 20% of the total die area (and associated power consumption). Moreover, with the adoption of multicore architectures, the impact of microcode memory size on the chip area has become relevant, forcing industry to revisit the microcode size problem. A solution to address this problem is to store the microcode in a compressed form and decompress it at runtime. This paper describes techniques for microcode compression that achieve significant area and power savings, while proposes a streamlined architecture that enables high throughput within the constraints of a high performance CPU. The paper presents results for microcode compression on several commercial CPU designs which demonstrates compression ratios ranging from 50 to 62%. In addition, it proposes techniques that enable the reuse of (pre-validated) hardware building blocks that can considerably reduce the cost and design time of the microcode decompression engine in real-world designs.
This paper describes a new method for code space optimization for interpreted languages called LZW-CC. The method is based on a well-known and widely used compression algorithm, LZW, which has been adapted to compress...
详细信息
This paper describes a new method for code space optimization for interpreted languages called LZW-CC. The method is based on a well-known and widely used compression algorithm, LZW, which has been adapted to compress executable program code represented as bytecode. Frequently occurring sequences of bytecode instructions are replaced by shorter encodings for newly generated bytecode instructions. The interpreter for the compressed code is modified to recognize and execute those new instructions. When applied to systems where a copy of the interpreter is supplied with each user program, space is saved not only by compressing the program code but also by automatically removing the unused implementation code from the interpreter. The method's implementation within two compiler systems for the programming languages Haskell and Java is described and implementation issues of interest are presented, notably the recalculations of target jumps and the automated tailoring of the interpreter to program code. Applying LZW-CC to nhc98 Haskell results in bytecode size reduction by up to 15.23% and executable size reduction by up to 11.9%. Java bytecode is reduced by up to 52%. The impact of compression on execution speed is also discussed;the typical speed penalty for Java programs is between 1.8 and 6.6%, while most compressed Haskell executables run faster than the original. Copyright (C) 2008 John Wiley & Sons, Ltd.
A program executing on a low-end embedded system, such as a smart-card, faces scarce memory resources and fixed execution time constraints. We demonstrate that factorization of common instruction sequences in Java byt...
详细信息
A program executing on a low-end embedded system, such as a smart-card, faces scarce memory resources and fixed execution time constraints. We demonstrate that factorization of common instruction sequences in Java bytecode allows the memory footprint to be reduced, on average, to 85% of its original size, with a minimal execution time penalty. While preserving Java compatibility, our solution requires only a few modifications which are straightforward to implement in any JVM used in a low-end embedded system.
Embedded systems currently account for all but 2% of the microprocessor market [1]. Yet often embedded processor cores are simply streamlined versions of microprocessors that were designed for desktop computers. In co...
详细信息
Embedded systems currently account for all but 2% of the microprocessor market [1]. Yet often embedded processor cores are simply streamlined versions of microprocessors that were designed for desktop computers. In computer architecture research, the Standard Performance Evaluation Corporation (SPEC) benchmark programs [2] are often used to evaluate the performance of computer systems. The characteristics of the SPEC programs do not match typical embedded applications [3], however, and therefore it is not clear to what extent performance studies conducted using-the SPEC benchmarks are applicable to embedded systems. In this paper, we focus on a specific segment of the embedded market: automotive engine controllers. Using data gathered by tracing a state-of-the-art automotive controller, we present simulation results for a memory system that consists of a main memory containing compressed programs, a bus used to fetch compressed instruction blocks, a decompression subsystem, and an uncompressed instruction cache. We look at a number of system parameters, including the bus load, the average memory access time, and the performance ratio of the compressed versus uncompressed system. We also present an analysis of the static and dynamic (run-time) system behavior.
This paper presents an efficient technique for code compression. In our work, a sequence of instructions that occurs repeatedly in an application will be compressed to reduce its code size. During compression, each in...
详细信息
This paper presents an efficient technique for code compression. In our work, a sequence of instructions that occurs repeatedly in an application will be compressed to reduce its code size. During compression, each instruction is first divided into the operation part and the register part, and then only the operation part is compressed. The compression information is stored in the instruction table, the register bank, and the index table. For reducing the run-time overhead, we propose an instruction prefetching mechanism to speed the decompression. Our work is performed with SPEC 2000, DSPstone, Mediabench, and MPEG4 benchmarks on the basis of the ARM instruction set. It is proved to be quite effective for media and other applications. The experimental results show that our work can achieve a code size reduction of 33% on average and a low overhead in the process of decompression at run time for these benchmarks.
We introduce a new PLA-based decoder architecture for random-access runtime decompression of compressed instruction memory in embedded systems. The compression method employs class-based coding. The symbol codebook us...
详细信息
We introduce a new PLA-based decoder architecture for random-access runtime decompression of compressed instruction memory in embedded systems. The compression method employs class-based coding. The symbol codebook used for decompression is fully programmable;thus, good compression may be achieved by adapting the codebook to the symbol frequency statistics of the target binary program. We show that this new class-based decoder architecture can be extended to provide high throughput decompression.
暂无评论