检索结果-内蒙古大学图书馆

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

Journal of computer science & Technology 2012年第1期27卷 57-74页

作者： Yang Yang Hui-Min Cui Xiao-Bing Feng Jing-Ling Xue State Key Laboratory of Computer Architecture Institute of Computing TechnologyChinese Academy of Sciences Beijing 100190China Graduate University of Chinese Academy of Sciences Beijing 100190China Programming Languages and Compilers Group School of Computer Science and Engineering University of New South WalesSydneyNSW 2052Australia

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.

关键词： stencil computation circular queue GPU occupancy register

来源：评论

学校读者我要写书评

暂无评论

Multicollisions and graph-based hash functions

Multicollisions and graph-based hash functions

引用

3rd International Conference on Trusted Systems, INTRUST 2011

作者： Halunen, Kimmo Oulu University Secure Programming Group Department of Computer Science and Engineering University of Oulu P.O. Box 4500 90014 Oulu Finland

ISBN: (纸本)9783642322976

In this paper, we present some generalisations of previous multicollision finding methods and apply these against a new type of tree-based hash functions. We also show that the very general class of hash functions first presented by Nandi and Stinson can be understood as graph-based hash functions and a graph theoretical approach can be utilised in studying their properties. Previously, an efficient multicollision attack has been found against the basic iterated hash function construction. This method has been applied to the generalised iterated hash functions and binary tree-based hash functions. We show that similar methods can be utilised also against t-ary tree-based hash functions, simplify some definitions and conjecture a similar result for multicollisions against graph-based hash functions. © 2012 Springer-Verlag.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

Annotation support for generic patches

Annotation support for generic patches

引用

International Workshop on Recommendation Systems for Software Engineering (RSSE)

作者： Georg Dotzler Ronald Veldema Michael Philippsen Computer Science Department Programming Systems Group University of Erlangen-Nuremberg Erlangen Germany

In large projects parallelization of existing programs or refactoring of source code is time consuming as well as error-prone and would benefit from tool support. However, existing automatic transformation systems are not extensively used because they either require tedious definitions of source code transformations or they lack general adaptability. In our approach, a programmer changes code inside a project, resulting in before and after source code versions. The difference (the generated transformation) is stored in a database. When presented with some arbitrary code, our tool mines the database to determine which of the generalized transformations possibly apply. Our system is different from a pure compiler based (semantics preserving) approach as we only suggest code modifications. Our contribution is a set of generalizing annotations that we have found by analyzing recurring patterns in open source projects. We show the usability of our system and the annotations by finding matches and applying generated transformations in real-world applications.

关键词： Benchmark testing Pattern matching Databases Generators Java Semantics

来源：评论

学校读者我要写书评

暂无评论

Leakage-Aware Modulo Scheduling for Embedded VLIW Processors

引用

Journal of computer science & Technology 2011年第3期26卷 405-417页

作者：关永薛京灵 College of Information Engineering Capital Normal University Programming Languages and Compilers Group School of Computer Science and Engineering University of New South Wales

As semi-conductor technologies move down to the nanometer scale, leakage power has become a significant component of the total power consumption. In this paper, we present a leakage-aware modulo scheduling algorithm to achieve leakage energy saving for applications with loops on Very Long Instruction Word （VLIW） architectures. The proposed algorithm is designed to maximize the idleness of function units integrated with the dual-threshold domino logic, and reduce the number of transitions between the active and sleep modes. We have implemented our technique in the Trimaran compiler and conducted experiments using a set of embedded benchmarks from DSPstone and Mibench on the cycle-accurate VLIW simulator of Trimaran. The results show that our technique achieves significant leakage energy saving compared with a previously published DAG-based （Directed Acyclic Graph） leakage-aware scheduling algorithm.

关键词： leakage power very long instruction word （VLIW） software pipelining modulo scheduling

来源：评论

学校读者我要写书评

暂无评论

Editorial: Special issue dedicated to ICFP 2010

引用

Journal of Functional programming 2012年第4-5期22卷 379-381页

作者： UMUT A. ACAR JAMES CHENEY STEPHANIE WEIRICH Programming Languages and Systems Group Max Planck Institute for Software Systems Germany (e-mail: umut@***) Laboratory for Foundations of Computer Science University of Edinburgh Edinburgh UK (e-mail: jcheney@inf.ed.ac.uk) School of Engineering and Applied Science University of Pennsylvania Philadelphia PA 19104 USA (e-mail: sweirich@cis.upenn.edu)

The 15th ACM SIGPLAN International Conference on Functional programming (ICFP) took place on September 27–29, 2010 in Baltimore, Maryland. After the conference, the programme committee, chaired by Stephanie Weirich, selected several outstanding papers and invited their authors to submit to this special issue of Journal of Functional programming. Umut A. Acar and James Cheney acted as editors for these submissions. This issue includes the seven accepted papers, each of which provides substantial new material beyond the original conference version. The selected papers reflect a consensus by the program committee that ICFP 2010 had a number of strong papers that link core functional programming ideas with other areas, such as multicore, embedded systems, and data compression.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Structural Equivalence Partition and Boundary Testing

Structural Equivalence Partition and Boundary Testing

引用

2011 Fachtagung des GI-Fachbereichs Softwaretechnik, Software Engineering 2011 - 2011 Conference of the GI Division on Software Engineering, Software Engineering 2011

作者： Oster, Norbert Philippsen, Michael Computer Science Department Programming Systems Group University of Erlangen Germany

ISBN: (纸本)9783885792772

Structural (manual or automated) testing today often overlooks typical programming faults because of inherent flaws in the simple criteria applied (e.g. branch or all-uses). Dedicated testing strategies that address such faults (e.g. mutation testing) are not specifically designed for smart automatic test case generation. In this paper we present a new coverage criterion and its implementation that accomplishes both: it detects more faults and integrates easily into automated test case generation. The criterion is targeted towards unveiling faults that originate from shifts in the equivalence classes that are caused by small coding errors (inspired by mutation testing). On benchmark codes from the Java-API and from an open-source project we improve the fault detection capability by up to 41% compared to branch and all-use coverage. © Gesellschaft für Informatik, Bonn 2011.

关键词： Fault detection

来源：评论

学校读者我要写书评

暂无评论

Enabling multiple accelerator acceleration for Java/OpenMP 3

Enabling multiple accelerator acceleration for Java/OpenMP

引用

3rd USENIX Workshop on Hot Topics in Parallelism, HotPar 2011

作者： Veldema, Ronald Blass, Thorsten Philippsen, Michael University of Erlangen-Nuremberg Computer Science Department Programming Systems Group Erlangen Germany

While using a single GPU is fairly easy, using multiple CPUs and GPUs potentially distributed over multiple machines is hard because data needs to be kept consistent using message exchange and the load needs to be balanced. We propose (1) an array package that provides partitioned and replicated arrays and (2) a compute-device library to abstract from GPUs and CPUs and their location. Our system automatically distributes a parallel-for loop in data-parallel fashion over all the devices. There are three contributions in this paper. First, we provide transparent use of multiple distributed GPUs and CPUs from within Java/OpenMP. Second, we partition arrays according to the compute-devices' relative performance that is computed from the execution time of a small micro benchmark and a series of small bandwidth tests run at program start. Third, we repartition the arrays dynamically at run-time by increasing or decreasing the number of machines used and by switching from CPUs-only to GPUs-only, to combinations of CPUs and GPUs, and back. With our dynamic device switching we minimize communication while maximizing device use. Our system automatically finds the optimal device sets and achieves a speedup of 5 - 200 on a cluster of 8 machines with 2 GPUs each. © HotPar 2011.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Security goals assurance based on software active monitoring

Security goals assurance based on software active monitoring

引用

International Conference on Secure Software Integration and Reliability Improvement

作者： Zhao, Changzhi Dong, Wei Leucker, Martin Qi, Zhichang Department of Computer Science National University of Defense Technology Changsha 410073 China Institute of Software Technology and Programming Languages University of Lubeck Germany

ISBN: (纸本)9780769544533

Access control is a vital security mechanism in today's operating systems and the security policies dictating the security relevant behaviors is lengthy and complex for example in Security-Enhanced Linux (SELinux). It is extremely difficult to verify the consistency between the security policies and the security goals desired by applications. In this paper we present how to predict whether the information flow security goal is violated or not during runtime how to generate the corresponding control actions on-line when divergence is detected and how to apply these actions in time based on software active monitoring technique. The symbolic security information flow model of SElinux is generated from a formalization of the access control mechanism which can be used to generate the N-step ahead projection of the future behavior. Information flow security goals are expressed in linear temporal logic (LTL) which provides clear description of the objectives desired by applications. Anticipatory monitor is generated from LTL formula automatically. We consider an on-line scheme where after the occurrence of an event, the next control action is determined on the basis of the N-step ahead projection of the future behavior. This procedure is repeated after the occurrence of next security relevant event. Thus a closed-loop system is generated that all behavior sequences will satisfy the security goals. © 2011 IEEE.

关键词： Access control

来源：评论

学校读者我要写书评

暂无评论

Resource-aware programming and simulation of MPSoC architectures through extension of X10 11

Resource-aware programming and simulation of MPSoC architect...

引用

Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems

作者： Hannig, Frank Roloff, Sascha Snelting, Gregor Teich, Jürgen Zwinkau, Andreas Hardware/Software Co-Design Department of Computer Science University of Erlangen Nuremberg Germany Programming Paradigms Group Karlsruhe Institute of Technology - KIT Germany

ISBN: (纸本)9781450307635

The efficient use of future MPSoCs with f 000 or more processor cores requires new means of resource-aware programming to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal problems. In this paper, we apply a new approach called invasive computing that enables an application programmer to spread computations to processors deliberately and on purpose at certain points of the program. Such decisions can be made depending on the degree of application parallelism and the state of the underlying resources such as utilization, load, and temperature. The introduced programming constructs for resource-aware programming are embedded into the parallel computing language X10 as developed by IBM using a library-based approach. Moreover, we show how individual heterogeneous MPSoC architectures may be modeled for subsequent functional simulation by defining compute resources such as processors themselves by lightweight threads that are executed in parallel together with the application threads by the X10 run-time system. Thus, the state changes of each hardware resource may be simulated including temperature, aging, and other useful monitor functionality to provide a first high-level programming test-bed for invasive computing. Copyright © 2011 ACM.

关键词： Parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Acculock: Accurate and efficient detection of data races 11

Acculock: Accurate and efficient detection of data races

引用

International Symposium on Code Generation and Optimization (CGO)

作者： Xinwei Xie Jingling Xue Programming Languages and Compilers Group School of Computer Science and Engineering University of New South Wales NSW Australia

ISBN: (纸本)9781612843568

Happens-before detectors are precise but can be too conservative to detect certain data races in repeated test runs as they are sensitive to thread interleaving. By making the opposite tradeoffs, lockset detectors can detect more races but are not precise (by reporting false positives). For both types of detectors, happens-before detectors run more slowly as they use expensive vector clocks. Existing hybrid race detectors (combining lockset and happens-before) alleviate some of the limitations in both analysis techniques at the cost of additional analysis overhead. Recently, due to FastTrack, epoch-based happens-before and lockset detectors now exhibit comparable performance. It is the time to rethink how to design a hybrid race detector to balance precision and coverage, by leveraging the lightweightness of epoch clocks. Acculock is the first such a solution. Acculock analyzes a program by reasoning about the subset of the happens-before relation observed with lock acquires and releases excluded, thereby reducing its sensitivity to thread interleaving. When such a weaker happens-before relation is violated, Acculock applies a new efficient lockset algorithm to enforce a lock-based synchronization discipline by distinguishing the locks protecting reads and writes. The key motivation behind is to ensure that Acculock can improve happens-before detectors by discovering also data races in alternate thread interleavings when analyzing one program execution while limiting false warnings thus incurred in a controlled manner. In addition, Acculock achieves these objectives by maintaining comparable performance as FastTrack, the fastest happens-before detector. All these properties of Acculock are validated and confirmed by comparing it against six other detectors, all implemented in Jikes RVM using 11 benchmark programs.

关键词： Detectors Clocks Synchronization Instruction sets Copper Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：