The embedded block coding with optimized truncation (EBCOT) algorithm is the heart of the JPEG 2000 image compression system. The mq coder used in this algorithm restricts throughput of the EBCOT because there is very...
详细信息
The embedded block coding with optimized truncation (EBCOT) algorithm is the heart of the JPEG 2000 image compression system. The mq coder used in this algorithm restricts throughput of the EBCOT because there is very high correlation among all procedures to be performed in it. To overcome this obstacle, a high throughput mq coder architecture is presented in this paper. To accomplish this, we have studied the number of rotations performed and the rate of byte emission in an image. This study reveals that in an image, on an average 75.03% and 22.72% of time one and two shifts occur, respectively. Similarly, about 5.5% of time two bytes are emitted concurrently. Based on these facts, a new mq coder architecture is proposed which is capable of consuming one symbol per clock cycle. The throughput of this coder is improved by operating the renormalization and byte out stages concurrently. To reduce the hardware cost, synchronous shifters are used instead of hard shifters. The proposed architecture is implemented on Stratix FPGA and is capable of operating at 145.9 MHz. Memory requirement of the proposed architecture is reduced by a minimum of 66% compared to those of the other existing architectures. Relative figure of merit is computed to compare the overall efficiency of all architectures which show that the proposed architecture provides good balance between the throughput and hardware cost. (C) 2011 Elsevier B.V. All rights reserved.
This paper presents a secure mq coder (Smq) for efficient selective encryption of JPEG 2000 images. Being different from existing schemes where encryption overhead is proportional to the size of plain image, Smq only ...
详细信息
This paper presents a secure mq coder (Smq) for efficient selective encryption of JPEG 2000 images. Being different from existing schemes where encryption overhead is proportional to the size of plain image, Smq only selectively encrypts tiny and constant volume of data in JPEG 2000 coding regardless of image size. It is extremely fast and suitable for protecting JPEG 2000 images in wireless multimedia sensor networks (WMSNs). Theoretical analysis and experimental results show that Smq can achieve a balance between security and efficiency, while keeping comparable compression performance and energy consumption with standard JPEG 2000 coding. (C) 2014 Elsevier B.V. All rights reserved.
The embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG2000 image compression standard. The matrix quantizer (mq) coder used in this algorithm restricts throughput of the EBCOT because f...
详细信息
ISBN:
(纸本)9781467350662
The embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG2000 image compression standard. The matrix quantizer (mq) coder used in this algorithm restricts throughput of the EBCOT because feedback loop operations are iterative. To overcome this we use phase shift clocks to update registers used in mq coder. The proposed architecture is implemented on Virtex-5 Xilinx FPGA.
Embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG 2000 image compression system. In this algorithm, output generated by the bit plane coder (BPC) is supplied to an mq coder. Though sev...
详细信息
ISBN:
(纸本)9781424458981
Embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG 2000 image compression system. In this algorithm, output generated by the bit plane coder (BPC) is supplied to an mq coder. Though several high speed BPC architectures are available, overall performance EBCOT algorithm is getting restricted by the speed of an mq coder. Therefore, we propose a high speed, area efficient VLSI architecture for an M:Q coder. This pipelined design is implemented on Startix series FPGA and it operates at 153 MHz. The synthesis report demonstrates that as compared to existing designs, the requirement of logic- and memory- elements is reduced by about 71.53% and 59.3%, respectively. Throughput of the proposed mq coder is 137.7 MS/s which is 1.85 times higher compared to the designs reported. The renormalization module is capable of operating at 326 MHz. So, coding efficiency can further be improved by using multiple clock domains.
mq is an efficient entropy coder that performs the actual compression in JPEG2000. However, it usually acts as the bottleneck of the hardware architecture due to the feedback loops caused by iterative operations. The ...
详细信息
ISBN:
(纸本)9781479957521
mq is an efficient entropy coder that performs the actual compression in JPEG2000. However, it usually acts as the bottleneck of the hardware architecture due to the feedback loops caused by iterative operations. The current single ejection (SE) architecture achieves higher frequency by adopting more pipeline stages, but the speed is limited by the throughput per cycle. On the other hand, the multiple ejections (ME) usually handles more than one context data (CxD) per cycle, while the frequency is deteriorated due to the longer critical circuit caused by the context dependences. Hence, to enable the mq arithmetic coder to process more than one sample while running at higher clock frequency, this paper proposes a two-CxDs architecture based on the equality of two adjacent CxDs, in which the two adjacent CxDs with different contexts are processed in a clock. Experiment results illustrate the architecture increases above 30% speed performance.
mq arithmetic coder has been adopted to achieve entropy coding in the latest image compression standard JPEG2000, which is a bit-level operation with intensive branch and feedback thus becomes a serious bottleneck of ...
详细信息
ISBN:
(纸本)9780819469519
mq arithmetic coder has been adopted to achieve entropy coding in the latest image compression standard JPEG2000, which is a bit-level operation with intensive branch and feedback thus becomes a serious bottleneck of high speed JPEG2000. In this paper, an efficient implementation scheme for mq coder was proposed, in which the renormalization process with BYTEOUT was performed in batch fashion instead of gradual iteration as introduced in JPEG2000. Experimental results have proved the validity of this method in decreasing computation complexity.
Embedded block coding, i.e., embedded block coder with optimal truncation (EBCOT) tier-1, is the most computationally intensive part of the JPEG2000 image coding standard. Past research on fast EBCOT tier-1 hardware i...
详细信息
Embedded block coding, i.e., embedded block coder with optimal truncation (EBCOT) tier-1, is the most computationally intensive part of the JPEG2000 image coding standard. Past research on fast EBCOT tier-1 hardware implementations has concentrated on cycle-efficient formation. These pass-parallel architectures require that JPEG2000's three mode switches be turned on;thus, coding efficiency is sacrificed for improved throughput. In this paper, a new fast EBCOT tier-1 design is presented: It is called the split arithmetic encoder (SAE) process. The proposed process exploits concurrency to obtain improved throughput while preserving coding efficiency. The SAE process is evaluated using the following three methods: I clock cycle estimation, multithreaded software implementation, and FPGA hardware implementation. All three methods achieve throughput improvement;the hardware implementation exhibits the largest speedup, as expected. Them benefits of evaluating a proposed process (algorithm) from different perspectives are illustrated.
Much work has been performed on optimizing the throughput of the block coding system within JPEG2000. However, the question remains as to whether providing parallel simple block coders provides a cheaper method of inc...
详细信息
Much work has been performed on optimizing the throughput of the block coding system within JPEG2000. However, the question remains as to whether providing parallel simple block coders provides a cheaper method of increasing throughput than complicated optimized block coders. We present the analysis and results for a system on a chip (SoC) software/hardware codesign platform, for parallel coding in JPEG2000 compression standard. We design both a simple and a high performance, optimized peripheral encoder as a hardware accelerator for the JPEG2000 SoC encoding system. The system is implemented on an Altera NIOS II processor with flexible integrated peripheral. We show that there are optimum numbers of parallel block coders and scheduling granularity per row of codeblocks, and that parallel optimized encoders outperform parallel simple encoders. We also demonstrate that the block coding system becomes work starved rather than memory blocked when many parallel coders are present, indicating a discrete wavelet transform bottleneck.
The embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG 2000 image compression system. Various applications, such as medical imaging, satellite imagery, digital cinema, and others, requi...
详细信息
The embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG 2000 image compression system. Various applications, such as medical imaging, satellite imagery, digital cinema, and others, require high speed, high performance EBCOT architecture. Though efficient EBCOT architectures have been proposed, hardware requirement of these existing architectures is very high and throughput is low. To solve this problem, we investigated rate of concurrent context generation. Our paper revealed that in an image rate of four or more context pairs generation is about 68.9%. Therefore, to encode all samples in a stripe-column, concurrently a new technique named as compact context coding is devised. As a consequence, high throughput is attained and hardware requirement is also cut down. The performance of the matrix quantizer coder is improved by operating renormalization and byte out stages concurrently. The entire design of EBCOT encoder is tested on the field programmable gate array platform. The implementation results show that throughput of the proposed architecture is 163.59 MSamples/s which is equivalent to encoding 1920p (1920 x 1080,4 : 2 : 2) high-definition TV picture sequence at 39 f/s. However, only bit plane coder (BPC) architecture operates at 315.06 MHz which implies that it is 2.86 times faster than the fastest BPC design available so far. Moreover, it is capable of encoding digital cinema size (2048 x 1080) at 42 f/s. Thus, it satisfies the requirement of applications like cartography, medical imaging, satellite imagery, and others, which demand high- speed real-time image compression system.
The JPEG2000 image coding standard provides many superior features compared to JPEG and other compression standards. However, the relatively slow performance of JPEG2000, especially in software implementations, is a c...
详细信息
ISBN:
(纸本)9780769547688
The JPEG2000 image coding standard provides many superior features compared to JPEG and other compression standards. However, the relatively slow performance of JPEG2000, especially in software implementations, is a critical drawback of the standard. Moreover, as image sizes rapidly grow in size, higher demands on performance for image coding and processing are introduced, making the slow performance of JPEG2000 even further pronounced. While much effort over the past decade has been devoted to accelerating the JPEG2000 encoder, there have been very few studies focusing on improving the performance of the JPEG2000 decoder, despite the fact that the performance of the decoder is just as critical as the encoder. This paper proposes a high-performance JPEG2000 decoder that efficiently exploits the recent improvements of modern parallel programming models and hardware architectures. Specifically, a parallel streaming decoder running on a GPGPU-CPU heterogeneous system is developed to fully exploit both the flexibility of the high-performance multi-core CPUs and the massively parallel capability of GPGPUs. In addition, a new task scheduling strategy is developed that exploits the soft-heterogeneity in OpenCL and C/C++ at runtime in order to gain a significant performance boost. Running on a heterogeneous configuration of one Nvidia GTX 480 GPU and one Intel Core i7 CPU, the parallel streaming decoder gains more than 8x speedup in runtime compared to the JasPer JPEG2000 software implementation.
暂无评论