检索结果-内蒙古大学图书馆

International conference for high performance Computing, Networking, Storage and Analysis (SC)

作者： Haner, Thomas Steiger, Damian S. Swiss Fed Inst Technol Inst Theoret Phys CH-8093 Zurich Switzerland

ISBN: (数字)9781450351140

ISBN: (纸本)9781450351140

Near-term quantum computers will soon reach sizes that are challenging to directly simulate, even when employing the most powerful supercomputers. Yet, the ability to simulate these early devices using classical computers is crucial for calibration, validation, and benchmarking. In order to make use of the full potential of systems featuring multi- and many-core processors, we use automatic code generation and optimization of compute kernels, which also enables performance portability. We apply a scheduling algorithm to quantum supremacy circuits in order to reduce the required communication and simulate a 45-qubit circuit on the Cori II super-computer using 8,192 nodes and 0.5 petabytes of memory. To our knowledge, this constitutes the largest quantum circuit simulation to this date. Our highly-tuned kernels in combination with the reduced communication requirements allow an improvement in time-to-solution over state-of-the-art simulations by more than an order of magnitude at every scale.

关键词： performance evaluation program processors Scheduling algorithms Computational modeling Supercomputers Integrated circuit modeling Quantum circuit

来源：评论

学校读者我要写书评

暂无评论

AOT vs. JIT: Impact of Profile Data on Code Quality 2017

AOT vs. JIT: Impact of Profile Data on Code Quality

引用

18th ACM SIGPLAN/SIGBED conference on Languages, Compilers, Tools and Theory for Embedded Systems (LCTES)

作者： Wade, April W. Kulkarni, Prasad A. Jantz, Michael R. Univ Kansas Lawrence KS 66045 USA Univ Tennessee Knoxville TN USA

ISBN: (纸本)9781450350303

Just-in-time (JIT) compilation during program execution and ahead-of-time (AOT) compilation during software installation are alternate techniques used by managed language virtual machines (VM) to generate optimized native code while simultaneously achieving binary code portability and high execution performance. Profile data collected by JIT compilers at run-time can enable profile-guided optimizations (PGO) to customize the generated native code to different program inputs. AOT compilation removes the speed and energy overhead of online profile collection and dynamic compilation, but may not be able to achieve the quality and performance of customized native code. The goal of this work is to investigate and quantify the implications of the AOT compilation model on the quality of the generated native code for current VMs. First, we quantify the quality of native code generated by the two compilation models for a state-of-the-art (HotSpot) Java VM. Second, we determine how the amount of profile data collected affects the quality of generated code. Third, we develop a mechanism to determine the accuracy or similarity for different profile data for a given program run, and investigate how the accuracy of profile data affects its ability to effectively guide PGOs. Finally, we categorize the profile data types in our VM and explore the contribution of each such category to performance.

关键词： program profiling Profile-guided optimizations

来源：评论

学校读者我要写书评

暂无评论

Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA 3rd

引用

3rd Latin American high-performance Computing conference (CARLA)

作者： Nurudin Alvarez, Francisco Antonio Ortega-Toro, Jose Ujaldon, Manuel Univ Malaga ESTI Informat Comp Architecture Dept Bulevar Louis Pasteur S-N Campus Teatinos E-29071 Malaga Spain

ISBN: (纸本)9783319579726;9783319579719

We present a study on Matrix-Vector Product operations in the Maxwell GPU generation through the PyCUDA python library. Through this lens, a broad analysis is performed over different memory management schemes. We identify the approaches that result in higher performance in current GPU generations when using dense matrices. The found guidelines are then applied to the implementation of the sparse matrix-vector product, covering structured (DIA) and unstructured (CSR) sparse matrix formats. Our experimental study on different datasets reveals that there is room for little improvement in the current state of the memory hierarchy, and that the expected Pascal GPU generation will get a major benefit from our techniques.

关键词： program processors

来源：评论

学校读者我要写书评

暂无评论

CoCloud: Enabling efficient cross-cloud file collaboration based on inefficient web APIs

CoCloud: Enabling efficient cross-cloud file collaboration b...

引用

2017 IEEE conference on computer Communications, INFOCOM 2017

作者： Jinlong, E. Cui, Yong Wang, Peng Li, Zhenhua Zhang, Chaokun Department of Computer Science and Technology Tsinghua University China School of Computer Science Carnegie Mellon University United States School of Software TNLIST KLISS MoE Tsinghua University China

ISBN: (纸本)9781509053360

Cloud storage services such as Dropbox have been widely used for file collaboration among multiple users. However, this desirable functionality is yet restricted to the 'walled-garden' of each service. At present, the only effective approach to cross-cloud file collaboration seems to be using web APIs, whose performance is known to be highly unstable and unpredictable. Now that using inefficient web APIs is inevitable, in this paper we attempt to achieve sound user-perceived performance for cross-cloud file collaboration. This attempt is enabled by two key observations from real-world measurements. First, for each cloud, we are always able to deploy one or several nearby (client) proxies which can efficiently access the web APIs. Second, during file collaboration, significant similarity exists among different versions of a file. This can be exploited to substantially reduce inter-proxy traffic and thus shorten the data sync time. Guided by the observations, we design and implement an open-source prototype system called CoCloud. Currently, it supports file collaboration among four popular cloud storage services in the US and China. Its performance is well acceptable to users under representative workloads, even approaching or exceeding intra-cloud performance in many cases. © 2017 IEEE.

关键词： Open systems

来源：评论

学校读者我要写书评

暂无评论

Determining the stock status of snapper (Lutjanus sp.) using surplus production model: a case study in Banyuasin coastal waters, South Sumatra, Indonesia

引用

IOP conference Series: Earth and Environmental Science 2020年第1期404卷

作者： Fauziyah A I S Purwiyanto F Agustriani W A E Putri Ermatita A Putra Marine Science Study Program Faculty of Mathematics and Natural Sciences Sriwijaya University Indonesia Information System Department Faculty of Computer Science Sriwijaya University Indonesia

Snapper (Lutjanus sp.) is an economically important fish for local fishermen in Banyuasin coastal water of South Sumatra. However, the current and historical stock of this species is still unknown. This study was aimed to estimate the stock status of Lutjanus sp. in the Banyuasin coastal waters. The annual catch and effort data were analyzed from 2008 to 2016. The different surplus production models were tested to obtain the best-fitted model based on the sign suitability test, model performance test, and multiple criteria analysis. The results indicated that the best-fitted model for Lutjanus sp. was the Fox model. The model had the best value for the determination coefficient (R2 = 97.2%), Nash-Sutcliffe Efficiency (-0.277), Mean Absolute Deviation (29.198), Mean Square Error (1,190.522), Root Mean Square Error (34.504), and RMSE-observations Standard Deviation Ratio (1.13), whereas the value of Mean Absolute Percentage Error (0.05) was the second-best value. The optimum effort (Eopt), maximum sustainable catch (CMSY), and total allowable catch were 22.236 trips/year, 623 ton and 498 ton/year, respectively. Based on plotting the effort and exploitation level (141%; 102%) in 2016, the stock status of Lutjanus sp. indicated depleting stock, the high fishing pressure and could encourage overfishing stock in the future.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern computer Architectures 32nd

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Co...

引用

32nd International conference on high performance Computing (ISC high performance)

作者： Kronbichler, Martin Kormann, Katharina Pasichnyk, Igor Allalen, Momme Tech Univ Munich Inst Computat Mech Boltzmannstr 15 D-85747 Garching Germany Max Planck Inst Plasma Phys Boltzmannstr 2 D-85748 Garching Germany Tech Univ Munich Zentrum Math Boltzmannstr 3 D-85747 Garching Germany IBM Deutschland Boltzmannstr 1 D-85748 Garching Germany Bayer Akad Wissensch Leibniz Rechenzentrum Boltzmannstr 1 D-85748 Garching Germany

ISBN: (纸本)9783319586670

This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory parallelization are detailed. Computational results are presented for dual-socket Intel Haswell CPUs at 28 cores, a 64-core Intel Knights Landing, and a 16-core IBM Power8 processor. Up to polynomial degree six, Knights Landing is approximately twice as fast as Haswell. Power8 performs similarly to Haswell, trading a higher frequency for narrower SIMD units. The performance comparison shows that simple ways to express parallelism through for loops perform better on medium and high core counts than a more elaborate task-based parallelization with dynamic scheduling according to dependency graphs, despite less memory transfer in the latter algorithm.

关键词： Galerkin methods

来源：评论

学校读者我要写书评

暂无评论

How to Get an Efficient yet Verified Arbitrary-Precision Integer Library 1

引用

9th International Working conference on Verified Software - Theories, Tools, and Experiments (VSTTE)

作者： Rieu-Helft, Raphael Marche, Claude Melquiond, Guillaume Ecole Normale Super F-75230 Paris France Univ Paris Saclay INRIA F-91120 Palaiseau France Univ Paris Saclay LRI CNRS F-91405 Orsay France Univ Paris Saclay Univ Paris Sud F-91405 Orsay France

ISBN: (数字)9783319723082

ISBN: (纸本)9783319723082;9783319723075

The GNU Multi-Precision library is a widely used, safety-critical, library for arbitrary-precision arithmetic. Its source code is written in C and assembly, and includes intricate state-of-the-art algorithms for the sake of high performance. Formally verifying the functional behavior of such highly optimized code, not designed with verification in mind, is challenging. We present a fully verified library designed using the Why3 program verifier. The use of a dedicated memory model makes it possible to have the Why3 code be very similar to the original GMP code. This library is extracted to C and is compatible and performance-competitive with GMP.

关键词： Arbitrary-precision arithmetic Deductive program verification C language Why3 program verifier

来源：评论

学校读者我要写书评

暂无评论

SNFS: Small Writes Optimization for Log-Structured File System Based-on Non-Volatile Main Memory 19

SNFS: Small Writes Optimization for Log-Structured File Syst...

引用

19th IEEE International conference on high performance Computing and Communications (HPCC) / 15th IEEE International conference on Smart City (SmartCity) / 3rd IEEE International conference on Data Science and Systems (DSS)

作者： Li, Yang Liu, Fang Xiao, Nong Zeng, Jianqiang Zhu, Lingyu Natl Univ Def Technol State Key Lab High Performance Comp Sch Comp Changsha Hunan Peoples R China

ISBN: (纸本)9781538625880

Emerging non-volatile main memories (NVMMs) technologies can provide both data persistence and high performance at memory level. The design of existing file systems for NVMM has to handle the data durability problem between CPU cache and NVMM. However, most NVMM-aware file systems could not meet the strong data consistency requirement of applications with data structures, e.g. B-Tree. Traditional techniques, such as copy-on-write and journaling, delivering data consistency, have defects of write amplification and data copy, respectively. In this paper, we explore SNFS, one log-structured file system with optimization of data consistency based-on non-volatile main memory, providing high performance for applications with small writes. Specifically, SNFS adopts a small data-log mechanism to journal fine-grained data writes. It also uses in-place writes to minimize memory footprint for small data updating and accelerates data block locating with hashing strategy. Finally, we evaluate SNFS's performance with several write-intensive workloads, and experimental results show that SNFS improves the system throughput by up to 23 times compared to state-of-the-art file systems and reduces the execution time by up to 65.5%.

关键词： Nonvolatile memory Random access memory Metadata computer crashes Band-pass filters performance evaluation Optimization

来源：评论

学校读者我要写书评

暂无评论

Challenges and Ideas to Achieve Wireless 100 Gb/s Transmission A short overview of work in progress of the German Research Foundation (DFG) Special Priority program (SPP) 17

Challenges and Ideas to Achieve Wireless 100 Gb/s Transmissi...

引用

IEEE EUROCON -17th International conference on Smart Technologies

作者： Kraemer, Rolf IHP Dept Wireless Syst Frankfurt Germany

ISBN: (纸本)9781509038435

Wireless communications is one of the fastest growing technology fields, driving numerous other innovations in electronics. One challenging research area within the wireless field is to achieve a higher transmission speed. Today it is an open question how we can realize a wireless system at a speed of 100 Gb/s or even beyond. If we intend to use such systems in a mobile environment, we can only afford to spend approximately 1 - 10 pW/bit for the end-to-end communication. This includes all processing and protocol steps. A special priority project within the German research community was set up to investigate new paradigms for achieving the 100 Gb/s wireless transmission goal. Within 11 coordinated projects researchers from all over Germany are looking at several relevant issues ranging from the antennas and RF-Frontend, baseband-processing and error correction to protocol processing. One of the big challenges is to find the correct balance between analog and digital signal processing to achieve an extremely high performance at very low energy consumption. Another challenge is to find a good balance between bandwidth and bandwidth efficiency to achieve the 100 Gb/s goal. Finally, protocol processing will need new approaches to decouple the central processor of a computer from the high-end input/output operations. Here we report about work in progress and initial results of selected projects. One interesting finding was that FEC at speed up to 120 GB/s can be realized in a very energy efficient way and with small area/power consumption

关键词： Wireless 100Gb/s high-performance Wireless System Challeges of Wireles high-performance Systems

来源：评论

学校读者我要写书评

暂无评论

Methodologies and practices for adoption of a novel national research environment 18

Methodologies and practices for adoption of a novel national...

引用

2018 Practice and Experience in Advanced Research Computing conference: Seamless Creativity, PEARC 2018

作者： Fischer, Jeremy Turner, George Beck, Brian W. Snapp-Childs, Winona Hancock, David Y. Sudarshan, Sanjana Stewart, Craig A. Pervasive Technology Institute Indiana University BloomingtonIN United States Texas Advanced Computing Center University of Texas at Austin AustinTX United States

ISBN: (纸本)9781450364461

There are numerous domains of science that have been using high performance computing (HPC) systems for decades. Historically, when new HPC resources are introduced, specific variations may require researchers to make minor adjustments to their workflows but the general usage and expectations remain much the same. This consistency means that domain scientists can generally move from system to system as necessary and as new resources come online, they can be fairly easily adopted by these researchers. However, as novel resources, such as cloud computing systems, become available, additional work may be required in order to help researchers find and use the resource. When the goal of a system's funding and deployment is to find non-traditional research groups that have been under-served by the national cyberinfrastructure, a different approach to system adoption and training is required. When Jetstream was funded by the NSF as the first production research cloud, it became clear that to attract non-traditional or under-served researchers, a very proactive approach would be required. Here we show how the Jetstream team 1) developed methods and practices for increasing awareness of the system to both traditional HPC users as well as under-served and non-traditional users of HPC systems, 2) developed training approaches which highlight the capabilities that a cloud system may offer that are different from traditional HPC systems. We also discuss areas of success and failure, and plans for future efforts. © 2018 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery.

关键词： computer programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：