Nowadays, the massive use of multimedia data gives to data compression a funda-mental role in reducing the storage requirements and communication bandwidth. variable-length encoding (VLE) is a relevant data compressio...
详细信息
Nowadays, the massive use of multimedia data gives to data compression a funda-mental role in reducing the storage requirements and communication bandwidth. variable-length encoding (VLE) is a relevant data compression method that reduces input data size by assigning shorter codewords to mostly used symbols, and longer codewords to rarely utilized symbols. As it is a common strategy in many compres-sion algorithms, such as the popular Huffman coding, speeding VLE up is essen-tial to accelerate them. For this reason, during the last decade and a half, efficient VLE implementations have been presented in the area of General Purpose Graph-ics Processing Units (GPGPU). The main performance issues of the state-of-the-art GPU-based implementations of VLE are the following. First, the way in which the codeword look-up table is stored in shared memory is not optimized to reduce the bank conflicts. Second, input/output data are read/written through inefficient strided global memory accesses. Third, the way in which the thread-codes are built is not optimized to reduce the number of executed instructions. Our goal in this work is to significantly speed up the state-of-the-art implementations of VLE by solving their performance issues. To this end, we propose GVLE, a highly optimized implemen-tation of VLE on GPU, which uses the following optimization strategies. First, the caching of the codeword look-up table is done in a way that minimizes the bank conflicts. Second, input data are read by using vectorized loads to exploit fully the available global memory bandwidth. Third, each thread encoding is performed effi-ciently in the register space with high instruction-level parallelism and lower num-ber of executed instructions. Fourth, a novel inter-block scan method, which out-performs those of state-of-the-art solutions, is used to calculate the bit-positions of the thread-blocks encodings in the output bit-stream. Our proposed mechanism is based on a regular segmented scan p
A method has been devised which allows to encode the quantized output of a discrete-time memoryless Gaussian source nearly as efficiently as Huffman's optimum variable-length encoding procedure. With respect to th...
详细信息
A method has been devised which allows to encode the quantized output of a discrete-time memoryless Gaussian source nearly as efficiently as Huffman's optimum variable-length encoding procedure. With respect to the mean code-word length the performance of the two methods typically differs only by 0.1-0.2 bit. The basic idea of the new method is that each code word is made up of two components: the prefix and the kernel. The prefix specifies the length of the kernel and is encoded by means of Huffman's method. Despite the fact that the kernel length may vary from one code word to the other, the code used for the kernel is basically a fixed-length code, because once the prefix has been decoded, the begin of the next code word is known without decoding the kernel. The advantage of the new method as compared to Huffman's method is that only few variable-length codes have to be distinguished so that both encoding and decoding can be accomplished by means of quite simple algorithms requiring only few compare and branch operations and one single addition or subtraction per code word. Areas of application, especially when combining the method with predictive coding, are time series (e.g. the electroencephalogram) and other kind of quantized data.
Service composition and optimal selection (SCOS) plays a crucial role in cloud manufacturing (CMfg). While the existing service composition methods are hard to address the changes and uncertainties of CMfg dynamic env...
详细信息
Service composition and optimal selection (SCOS) plays a crucial role in cloud manufacturing (CMfg). While the existing service composition methods are hard to address the changes and uncertainties of CMfg dynamic environment. Therefore, a variable-length encoding genetic algorithm for structurevarying incremental service composition (ISC-GA) is proposed in this paper. Specifically, a novel variable-length encoding scheme containing structural information is proposed to describe the uncertain and changing process model. And the improved crossover and mutation algorithm suitable for individuals with nonlinear varying structure and incremental service composition is designed. It is realized by optimizing both the process structure and service instance combinations, and overcomes the drawbacks resulted from single preset process structure. Due to the difficulty of fitness computation caused by uncertain process structures, novelty is introduced as a new evolutionary pressure, and a novel framework for ISC-GA is presented, which helps to find both novel and high-performance solutions. Experimental results indicate the effectiveness of the proposed approach.(c) 2022 Elsevier B.V. All rights reserved.
Compared with the traditional assembly line, seru production can reduce worker(s) and decrease makespan. However, when the two objectives are considered simultaneously, Pareto-optimal solutions may save manpower but i...
详细信息
Compared with the traditional assembly line, seru production can reduce worker(s) and decrease makespan. However, when the two objectives are considered simultaneously, Pareto-optimal solutions may save manpower but increase makespan. Therefore, we formulate line-seru conversion towards reducing worker(s) without increasing makespan and develop exact and meta-heuristic algorithms for the different scale instances. Firstly, we analyse the distinct features of the model. Furthermore, according to the feature of the solution space, we propose two exact algorithms to solve the small to medium-scale instances. The first exact algorithm searches the solution space from more workers to fewer workers. The second exact algorithm searches the solution space from fewer workers to more workers. The two exact algorithms search a part of solution space to obtain the optimal solution of reducing worker(s) without increasing makespan. According to the variablelength of the feasible solutions, we propose a variable-length encoding heuristic algorithm for the large-scale instances. Finally, we use the extensive experiments to evaluate the performance of the proposed algorithms and to investigate some managerial insights on when and how to reduce worker(s) without increasing makespan by line-seru conversion.
Node placement problems, such as the deployment of radio-frequency identification systems or wireless sensor networks, are important problems encountered in various engineering fields. Although evolutionary algorithms...
详细信息
Node placement problems, such as the deployment of radio-frequency identification systems or wireless sensor networks, are important problems encountered in various engineering fields. Although evolutionary algorithms have been successfully applied to node placement problems, their fixed-lengthencoding scheme limits the scope to adjust the number of deployed nodes optimally. To solve this problem, we develop a flexible genetic algorithm in this paper. With variable-length encoding, subarea-swap crossover, and Gaussian mutation, the flexible genetic algorithm is able to adjust the number of nodes and their corresponding properties automatically. Offspring (candidate layouts) are created legibly through a simple crossover that swaps selected subareas of parental layouts and through a simple mutation that tunes the properties of nodes. The flexible genetic algorithm is generic and suitable for various kinds of node placement problems. Two typical real-world node placement problems, i.e., the wind farm layout optimization and radio-frequency identification network planning problems, are used to investigate the performance of the proposed algorithm. Experimental results show that the flexible genetic algorithm offers higher performance than existing tools for solving node placement problems. (C) 2016 Elsevier B.V. All rights reserved.
CAVLC (Context-Adaptive variablelength Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several h...
详细信息
CAVLC (Context-Adaptive variablelength Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several hardware accelerators for CAVLC have been designed. In contrast, high-performance software implementations of CAVLC (e.g., GPU-based) are scarce. A high-performance GPU-based implementation of CAVLC is desirable in several scenarios. On the one hand, it can be exploited as the entropy component in GPU-based H.264 encoders, which are a very suitable solution when GPU built-in H.264 hardware encoders lack certain necessary functionality, such as data encryption and information hiding. On the other hand, a GPU-based implementation of CAVLC can be reused in a wide variety of GPU-based compression systems for encoding images and videos in formats other than H.264, such as medical images. This is not possible with hardware implementations of CAVLC, as they are non-separable components of hardware H.264 encoders. In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation
Ensemble learning (EL) is a paradigm, involving several base learners working together to solve complex problems. The performance of the EL highly relies on the number and accuracy of weak learners, which are often ha...
详细信息
ISBN:
(纸本)9783031096778;9783031096761
Ensemble learning (EL) is a paradigm, involving several base learners working together to solve complex problems. The performance of the EL highly relies on the number and accuracy of weak learners, which are often hand-crafted by domain knowledge. Unfortunately, such knowledge is not always available to interested end-user. This paper proposes a novel approach to automatically select optimal type and number of base learners for disease classification, called Multi-Objective Evolutionary Ensemble Learning (MOE-EL). In the proposed MOEEL algorithm, a variable-length gene encoding strategy of the multi-objective algorithm is first designed to search for the weak learner optimal configurations. Moreover, a dynamic population strategy is proposed to speed up the evolutionary search and balance the diversity and convergence of populations. The proposed algorithm is examined and compared with 5 existing algorithms on disease classification tasks, including the state-of-the-art methods. The experimental results show the significant superiority of the proposed approach over the state-of-the-art designs in terms of classification accuracy rate and base learner diversity.
暂无评论