A physics-based algorithm for accelerating the computation of method of moments matrix blocks' low-rank approximation is presented. The algorithm relies on efficient sampling of phase-and amplitude-compensated int...
详细信息
A physics-based algorithm for accelerating the computation of method of moments matrix blocks' low-rank approximation is presented. The algorithm relies on efficient sampling of phase-and amplitude-compensated interactions using nonuniform grids. Rank-revealing analysis is applied, in a multilevel fashion, to matrices of reduced column and row dimensions that describe subdomains' interactions with these coarse grids, rather than to the original matrix blocks. As a result, significant savings are achieved, especially for the inherently more compressible dynamic quasi-planar and quasi-static cases. The algorithm's reduced storage and computation time requirements are estimated analytically and verified numerically for representative examples.
A nuclear electronics system designed to perform high precision energy measurement on a large dynamic range through high speed sampling of the output might be impossible to match to an adequate ADC. A solution consist...
详细信息
A nuclear electronics system designed to perform high precision energy measurement on a large dynamic range through high speed sampling of the output might be impossible to match to an adequate ADC. A solution consists in compressing the signal before digitization and linearizing it after with a look-up table, encoding the inverse of the compression function. This look-up table can be constructed using test pulses, the smallest of which is in the linear part and the largest spans the whole dynamic range. Reconstructing these pulse shapes and requiring them to be omothetic generates the look-up table providing a minimal distortion in the RMS sense. (C) 2002 Elsevier Science B.V. All rights reserved.
Equations describing facies proportions and amalgamation ratios are derived for randomly placed objects belonging to two or three foreground facies embedded in a background facies, as a function of the volume fraction...
详细信息
Equations describing facies proportions and amalgamation ratios are derived for randomly placed objects belonging to two or three foreground facies embedded in a background facies, as a function of the volume fractions and object thicknesses of independent facies models combined in a stratigraphically meaningful order. The equations are validated using one-dimensional continuum models. Evaluation of the equations reveals a simple relationship between an effective facies proportion and an effective amalgamation ratio, both measured as a function only of the facies in question and the background facies. This relationship provides a firm analytical basis for applying the compression algorithm to multi-facies object-based models. A set of two-dimensional cross-sectional models illustrates the approach, which allows models to be generated with realistic object stacking characteristics defined independently for each facies in a multi-facies object-based model.
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computi...
详细信息
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed on CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computational costs, model compression is a widely studied method to shrink the model size. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros. In this study, a software and hardware co-design approach is proposed to design MARS, a SRAM-based CIM (SRAM CIM)-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparse CNN, and an SRAM CIM-aware model compression algorithm that considers a CIM architecture to reduce the number of network parameters. With the proposed hardware software co-designed method, MARS can reach over 700 and 400 FPS for CIFAR-10 and CIFAR-100, respectively. In addition, MARS achieves 52.3 and 88.2 TOPs/W in VGG16 and ResNet18, respectively.
To examine the integrity and authenticity of an IP address efficiently and economically, this paper proposes a new non -iterative hash function called JUNA that is based on a multivariate permutation problem and an an...
详细信息
To examine the integrity and authenticity of an IP address efficiently and economically, this paper proposes a new non -iterative hash function called JUNA that is based on a multivariate permutation problem and an anomalous subset product problem to which no subexponential time solutions are found so far. JUNA includes an initialization algorithm and a compression algorithm, and converts a short message of n bits which is regarded as only one block into a digest of m bits, where 80 <= m <= 232 and 80 <= m <= n <= 4096. The analysis and proof show that the new hash is one-way, weakly collision -free, and strongly collision-free, and its security against existent attacks such as birthday attack and meet in -the -middle attack is to 0(2(m)). Moreover, a detailed proof that the new hash function is resistant to the birthday attack is given. Compared with the Chaum-Heijst-Pfitzmann hash based on a discrete logarithm problem, the new hash is lightweight, and thus it opens a door to convenience for utilization of lightweight digital signing schemes. (C) 2016 Elsevier B.V. All rights reserved.
As the number of complex multistate systems' components increases, one major challenge to analyze the reliabilities of complex multistate systems by Bayesian network (BN) is that the memory storage requirements (M...
详细信息
As the number of complex multistate systems' components increases, one major challenge to analyze the reliabilities of complex multistate systems by Bayesian network (BN) is that the memory storage requirements (MSRs) of conditional probability table (CPT) increase exponentially. When the components reach a certain amount, the MSRs of CPT will exceed the computer's random access memory (RAM). To solve this problem, this two-part paper proposes a novel multistate compression algorithm to compress the CPT so that the MSRs of CPT can be reduced apparently. In this Part I, an independent multistate inference algorithm is proposed to perform the inference of BN based on the compressed CPT for the complex multistate independent systems. Given the evidence of system, the backward inference algorithm is proposed to update the probability distributions of compoents. The above proposed algorithms can be generally applied to any complex multistate independent system without constraints on system structure and state configurations. In addition, the Part II studies the application of compression idea in the complex multistate dependent systems. Finally, two case studies are used to validate the performance of the proposed algorithms.
On-line data compression is a new alternative technique for improving memory system performance, which can increase both the effective memory space and the bandwidth of memory systems, However, decompression time acco...
详细信息
On-line data compression is a new alternative technique for improving memory system performance, which can increase both the effective memory space and the bandwidth of memory systems, However, decompression time accompanied by accessing compressed data may offset the benefits of compression. In this paper, a selectively compressed memory system (SCMS) based on a combination of selective compression and hiding of decompression overhead is proposed and analyzed. The architecture of an efficient compressed cache and its management policies are presented, Analytical modeling shows that the performance of SCMS is influenced by the compression efficiency, the percentage of references to the compressed data block, and the percentage of references found in the decompression buffer. The decompression buffer plays the most important role in improving the performance of the SCMS. If the decompression buffer can filter more than 70% of the references to the compressed blocks, the SCMS can significantly improve performance over conventional memory systems. (C) 2002 Elsevier Science B.V. All rights reserved.
Live migration of virtual machines has been a powerful tool to facilitate system maintenance, load balancing, fault tolerance, and power-saving, especially in clusters or data centers. Although pre-copy is extensively...
详细信息
Live migration of virtual machines has been a powerful tool to facilitate system maintenance, load balancing, fault tolerance, and power-saving, especially in clusters or data centers. Although pre-copy is extensively used to migrate memory data of virtual machines, it cannot provide quick migration with low network overhead but leads to large performance degradation of virtual machine services due to the great amount of transferred data during migration. To solve the problem, this paper presents the design and implementation of a novel memory-compression-based VM migration approach (MECOM for short) that uses memory compression to provide fast, stable virtual machine migration, while guaranteeing the virtual machine services to be slightly affected. Based on memory page characteristics, we design an adaptive zero-aware compression algorithm for balancing the performance and the cost of virtual machine migration. Using the proposed scheme pages are rapidly compressed in batches on the source and exactly recovered on the target. Experimental results demonstrate that compared with Xen, our system can significantly reduce downtime, total migration time, and total transferred data by 27.1%, 32%, and 68.8% respectively. (C) 2013 Elsevier B.V. All rights reserved.
Currently, in addition to the performance, the energy consumption (hereinafter EC) of jobs running in a big data processing systemis also of interest to academia and industry because it grows rapidly as an increasing ...
详细信息
Currently, in addition to the performance, the energy consumption (hereinafter EC) of jobs running in a big data processing systemis also of interest to academia and industry because it grows rapidly as an increasing amount of data is processed. Many studies focus on the EC optimization of jobs from the perspective of computation, which is specific to the algorithms in each job. However, the part of EC involved in I/O operations, which is general and universal, is mostly ignored in optimization. In this paper, we concentrate on the EC optimization of jobs from the perspective of I/O operations. To save energy, we argue that data compression could be exploited. On one hand, energy is saved by processing compressed data with less I/O cost. On the other hand, extra EC is incurred from the necessary data compression/decompression process, which may offset the saved energy. Therefore, there are tradeoffs to consider when determining whether to compress data for these jobs. In this paper, such tradeoffs and boundary conditions are studied. We first abstract a paradigm for the runtime environment of big data processing jobs. Then, we establish the power, jobs, compression, and I/O models in detail. Based on these models, we discuss the compression tradeoffs and derive the boundary conditions for EC optimization. Finally, we design and conduct experiments to validate our proposition. The experimental results confirm that the tradeoffs and boundary conditions exist for typical jobs in MapReduce and Spark. As explained, first, the EC of a job is reduced using data compression. Second, whether or not such optimization occurs is related to the specification of both the compression algorithm and the job and is determined by corresponding boundary conditions. Third, for a compression algorithm, the larger its compression/decompression speed and the better its compression ratio, the more likely it is to achieve EC optimization.
Recent mobile devices, which adopted Eureka-147, terrestrial-digital multimedia broadcasting (T-DMB) systems, are developed as integrated circuit. As a result, the space of memory expands hardly on mobile handheld. Th...
详细信息
Recent mobile devices, which adopted Eureka-147, terrestrial-digital multimedia broadcasting (T-DMB) systems, are developed as integrated circuit. As a result, the space of memory expands hardly on mobile handheld. Therefore most mobile handheld must operate a lot of application on limited memory. To solve the problem, most of the mobile devices use some kind of compression algorithms to overcome the memory shortage. Among such algorithms, Huffman algorithm is most widely used. In this study, the authors present a novel binary tree expression of the Huffman decoding algorithm which reduces the memory use approximately by 50% and increases the decoding speed up to 30%. The authors experiment the decoding speed on an evaluation kit (SMDK 6400), which is a T-DMB mobile handheld with an advanced risk machine processor. Later to enhance the decoding speed, the authors present an optimum Huffman decoder based on hardware implementation.
暂无评论