检索结果-内蒙古大学图书馆

Pragma Directed Shared Memory Centric Optimizations on GPUs

Journal of computer Science & technology 2016年第2期31卷 235-252页

作者： Jing Li CCF, Lei Liu Yuan Wu Xiang-Hua Liu Yi Gao Xiao-Bing Feng Cheng-YongWu State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China Beijing Samsung Telecom Research and Development Center Beijing 100028 China

GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource： Even using state-of-the-art high level programming models （e.g., OpenACC and OpenHMPP）, it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas.

关键词： GPU shared memory pragma directed data centric

来源：评论

学校读者我要写书评

暂无评论

Multiuser Computation Offloading for Long-Term Sequential Tasks in Mobile Edge computing Environments

引用

Tsinghua Science and technology 2023年第1期28卷 93-104页

作者： Huanhuan Xu Jingya Zhou Wenqi Wei Baolei Cheng School of Computer Science and Technology Soochow UniversitySuzhou 215006China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi 214125China Provincial Key Laboratory for Computer Information Processing Technology Soochow UniversitySuzhou 215006China School of Computer Science Georgia Institute of TechnologyAtlantaGA 30332USA

Mobile edge computing has shown its potential in serving emerging latency-sensitive mobile applications in ultra-dense 5G networks via offloading computation workloads from the remote cloud data center to the nearby network ***,current computation offloading studies in the heterogeneous edge environment face multifaceted challenges:Dependencies among computational tasks,resource competition among multiple users,and diverse long-term *** applications typically consist of several functionalities,and one huge category of the applications can be viewed as a series of sequential *** this study,we first proposed a novel multiuser computation offloading framework for long-term sequential ***,we presented a comprehensive analysis of the task offloading process in the framework and formally defined the multiuser sequential task offloading ***,we decoupled the long-term offloading problem into multiple single time slot offloading problems and proposed a novel adaptive method to solve *** further showed the substantial performance advantage of our proposed method on the basis of extensive experiments.

关键词： mobile edge computing sequential tasks computation offloading dependency

来源：评论

学校读者我要写书评

暂无评论

Memory bandwidth optimization of SpMV on GPGPUs

引用

Frontiers of computer Science 2015年第3期9卷 431-441页

作者： Chenggang Clarence YAN Hui YU Weizhi XU Yingping ZHANG Bochuan CHEN Zhu TIAN Yuxuan WANG Jian YIN Key Lab of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China Institute of Microelectronics Tsinghua University Beijing 100084 China Automation Department Tsinghua University Beijing 100084 China State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China State Grid Information & Communication Company of Hunan EPC Changsha 410007 China Department of Computer Shandong University Weihai 250101 China

It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.

关键词： GPGPU performance tuning SpMV cacheblocking memory bandwidth

来源：评论

学校读者我要写书评

暂无评论

Dependence of switching process on the perpendicular magnetic anisotropy constant in P-MTJ

引用

Chinese Physics B 2018年第9期27卷 635-638页

作者： Mao-Sen Yang Liang Fang Ya-Qing Chi State Key Laboratory of High-Performance Computing College of ComputerNational University of Defense TechnologyChangsha 410073China institute of Microelectronics College of ComputerNational University of Defense TechnologyChangsha 410073China

We investigate the dependence of the switching process on the perpendicular magnetic anisotropy （PMA） constant in perpendicular spin transfer torque magnetic tunnel junctions （P-MTJs） using micromagnetic simulations. It is found that the final stable states of the magnetization distribution of the free layer after switching can be divided into three different states based on different PMA constants： vortex, uniform, and steady. Different magnetic states can be attributed to a trade-off among demagnetization, exchange, and PMA energies. The generation of the vortex state is also related to the non-uniform stray field from the polarizer, and the final stable magnetization is sensitive to the PMA constant. The vortex and uniform states have different switching processes, and the switching time of the vortex state is longer than that of the uniform state due to hindrance by the vortex.

关键词： magnetic tunnel junction perpendicular magnetic anisotropy vortex state micromagnetic simula-tion

来源：评论

学校读者我要写书评

暂无评论

An efficient error control scheme for chip-to-chip optical interconnects

An efficient error control scheme for chip-to-chip optical i...

引用

2007 IEEE International Symposium on Circuits and Systems, ISCAS 2007

作者： Jun, Wang Ge, Zhang Weiwu, Hu Key Laboratory of Computer System and Architecture Institute of Computing Technology CAS Beijing 100080 China

As the gap between processing capability and bandwidth requirement of microprocessor increases, optical interconnects are used more and more widely in chip-to-chip data links. Trade-offs are made among latency, area, power consumption and reliability in the high frequency interconnect system in which error control schemes are always implemented to make it tolerate PER (packet error rate). In this paper, a scalable system for chip-to-chip optical interconnects is proposed and different types of error control schemes are compared using 90nm CMOS process. The goals are set to low latency, small area and low power consumption as well as acceptable MTTF (mean time to failure) for USR (ultra-short-reach) with very low BER (bit error rate). After that the ECS-HC based system is picked out and validated based on FPGA and 4.25Gbps 850nm optical transceivers. © 2007 IEEE.

关键词： Error correction

来源：评论

学校读者我要写书评

暂无评论

Cacheap:Portable and Collaborative I/O Optimization for Graph Processing

引用

Journal of computer Science & technology 2019年第3期34卷 690-706页

作者： Peng Zhao Chen Ding Lei Liu Jiping Yu Wentao Han Xiao-Bing Feng State Key Laboratory of Computer Architecture Institute of Computing TechnologyChinese Academy of Sciences Beijing 100190China University of Chinese Academy of Sciences Beijing 100049China Department of Computer Science University of RochesterRochester 14623U.S.A. Department of Computer Science and Technology Tsinghua UniversityBeijing 100084China

Increasingly there is a need to process graphs that are larger than the available memory on today's *** systems have been developed with grapli representations that are efficient and compact for out-of-core processing.A necessary task in these systems is memory *** paper presents a system called Cacheap which automatically and efficiently manages the available memory to maximize the speed of grapli processing,minimize the amount of disk access,and maximize the utilization of memory for graph *** has a simple interface that can be easily adopted by existing graph *** paper describes the new system,uses it in recent graph engines,and demonstrates its integer factor improvements in the speed of large-scale grapli processing.

关键词： out-of-core graph processing system I/O optimization memory cache graph analytics locality

来源：评论

学校读者我要写书评

暂无评论

Computation pattern driven reuse of manual optimizations for GPGPUs

Computation pattern driven reuse of manual optimizations for...

引用

2011 12th International Conference on Parallel and Distributed computing, Applications and Technologies, PDCAT 2011

作者： Xu, Shixiong Han, Dongni Chen, Li Key Laboratory of Computer System and Architecture Institute of Computing Technology University of Chinese Beijing China

ISBN: (纸本)9780769545646

The wide application of General Purpose Graphic Processing Units (GPGPUs) results in large manual efforts on porting and optimizing algorithms on them. However, most existing automatic ways of generating GPGPU code fail to conduct optimization strategies regarding a specific computation and to reuse constantly evolving manual optimizations. In this paper, we present a computation pattern driven approach for computation-specific GPGPU code generation and optimization, which in turn reuses manual optimizations to a certain extent. We suggest language extensions to OpenMP, high-level data structure attributes, in order to assist the process of computation pattern matching and to help give users intuitive performance tuning parameters in the view of data structure attributes. We illustrate the feasibility of this approach through three important computation dwarfs, which are dense matrix, sparse matrix, and structured mesh computation in scientific computing. We also build a prototype OpenMP-to-CUDA translator that consists of computation pattern recognition and code template instantiation. The experimental results demonstrate the performance benefits of computation pattern driven method. To our best knowledge, it is the first work on reusing manual optimizations for GPGPUs with computation pattern driven approach. © 2011 IEEE.

关键词： Data structures

来源：评论

学校读者我要写书评

暂无评论

Forgeability of Wang-Zhu-Feng-Yau’s Attribute-Based Signature with Policy-and-Endorsement Mechanism

引用

Journal of computer Science & technology 2013年第4期28卷 743-748页

作者：葛爱军黄欣沂陈成马传贵张锐 State Key Laboratory of Mathematical Engineering and Advanced Computing Zhengzhou Information Science and Technology Institute State Key Laboratory of Information Security Institute of Information Engineering Chinese Academy of Sciences Fujian Provincial Key Laboratory of Network Security and Cryptology School of Mathematics and Computer ScienceFujian Normal University Institute of Software Chinese Academy of Sciences

Recently, Wang et al. presented a new construction of attribute-based signature with policy-and-endorsement mechanism. The existential unforgeability of their scheme was claimed to be based on the strong Diffie-Hellman assumption in the random oracle model. Unfortunately, by carefully revisiting the design and security proof of Wang et alfs scheme, we show that their scheme cannot provide unforgeability, namely, a forger, whose attributes do not satisfy a given signing predicate, can also generate valid signatures. We also point out the flaws in Wang et al.＇s proof.

关键词： attribute-based signature existential unforgeability policy-and-endorsement

来源：评论

学校读者我要写书评

暂无评论

A Dependability Model for TMR System

引用

International Journal of Automation and computing 2012年第3期9卷 315-324页

作者： Peng, Jun-Jie Liu, Yan-Ping Chen, Yuan-Yuan School of Computer Engineering and Science Shanghai University Shanghai 200072 China Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China

Much research has been done on the dependability evaluation of computer systems. However, much of this is gone no further than study of the fault coverage of such systems, with little focus on the relationship between fault coverage and overall system dependability. In this paper, a Markovian dependability model for triple-modular-redundancy (TMR) system is presented. Having fully considered the effects of fault coverage, working time, and constant failure rate of single module on the dependability of the target TMR system, the model is built based on the stepwise degradation strategy. Through the model, the relationship between the fault coverage and the dependability of the system is determined. What is more, the dependability of the system can be dynamically and precisely predicted at any given time with the fault coverage set. This will be of much benefit for the dependability evaluation and improvement, and be helpful for the system design and maintenance.

关键词： Dependability Markovian model coverage triple-modular-redundancy (TMR) failure rate.

来源：评论

学校读者我要写书评

暂无评论

Tri-Co Robot: a Chinese robotic research initiative for enhanced robot interaction capabilities

引用

National Science Review 2018年第6期5卷 799-801页

作者： Han Ding Xuejun Yang Nanning Zheng Ming Li Yinan Lai Hao Wu State Key Laboratory of Digital Manufacturing Equipment and Technology Huazhong University of Science and Technology State Key Laboratory of High Performance Computing College of Computer National University of Defense Technology Institute of Artificial Intelligence and Robotics Xi’an Jiaotong University National Natural Science Foundation of China

Since the invention of the first industrial robot in 1959, the missions of robots have evolved from basic mechanical transfer or assistance to a diverse range of tasks through close interactions with environment, their human counterparts and robot peers. Through adaptation to uncertain and dynamic environments, legged robots can achieve coordinated locomotion in rough terrain, even

关键词： Co a Chinese robotic research initiative for enhanced robot interaction capabilities Tri-Co Robot

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：