The Synthetic Aperture Radar (SAR) system is a kind of modern high-resolution microwave imaging radar used in all-weather and all day long to provide remote sensing means and generate high resolution images of the lan...
详细信息
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning...
详细信息
ISBN:
(纸本)9781509049318
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in an architecture-independent manner, without involving domain experts. The key insight is to leverage a wide range of architecture-specific abstractions already available in LLVM, by first generating a vectorized micro-kernel in the architecture-independent LLVM IR and then improving its performance by applying a series of domain-specific yet architecture-independent optimizations. The optimized micro-kernel drops easily in existing GEMM frameworks such as BLIS and OpenBLAS. Validation focuses on optimizing GEMM in double precision on two architectures. On Intel Sandybridge and AArch64 Cortex-A57, Poca's micro-kernels outperform expert-crafted assembly code by 2.35% and 7.54%, respectively, and both BLIS and OpenBLAS achieve competitive or better performance once their micro-kernels are replaced by Poca's.
The global open source software resources have become an Internet-scale repository, which provide abundant resources for software reuse. However, how to locate the desired resource efficiently and accurately from such...
详细信息
Software projects are not developed in isolation but often build upon other open source resources. These projects form a kind of reference ecosystem regarded as a software world. Most of social computing works focus o...
详细信息
Bloom filters are frequently used to to check the membership of an item in a set. However, Bloom filters face a dilemma: the transmission bandwidth and the accuracy cannot be optimized simultaneously. This dilemma is ...
详细信息
In this paper, an improved algorithm is proposed for the reconstruction of singularity connectivity from the available pairwise connections during preprocessing phase. To evaluate the performance of our algorithm, an ...
详细信息
With the rapid development of open source software, various elements such as OSS, developers, users and online posts, across different communities and their interactions constitute a novel software ecosystem. Most of ...
详细信息
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over...
详细信息
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual ma- chines (VMs), in particular, is critical to guarantee quality of service (QoS). In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder+, both of which can provision hundreds of VMs in seconds.
Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. Howeve...
详细信息
Fingerprint has been widely used in a variety of biometric identification systems in the past several years due to its uniqueness and immutability. With the rapid development of fingerprint identification techniques, ...
详细信息
Fingerprint has been widely used in a variety of biometric identification systems in the past several years due to its uniqueness and immutability. With the rapid development of fingerprint identification techniques, many fingerprint identification systems are in urgent need to deal with large-scale fingerprint storage and high concurrent recognition queries, which bring huge challenges to the system. In this circumstance, we design and implement a distributed and load-balancing fingerprint identification system named Pegasus, which includes a distributed feature extraction subsystem and a distributed feature storage subsystem. The feature extraction procedure combines the Hadoop Image processing Interface(HIPI) library to enhance its overall processing speed; the feature storage subsystem optimizes MongoD B's default load balance strategy to improve the efficiency and robustness of *** and simulations are carried out, and results show that Pegasus can reduce the time cost by 70% during the feature extraction procedure. Pegasus also balances the difference of access load among front-end mongos nodes to less than 5%. Additionally, Pegasus reduces over 40% of data migration among back-end data shards to obtain a more reasonable data distribution based on the operation load(insertion, deletion, update, and query) of each shard.
暂无评论