检索结果-内蒙古大学图书馆

A MESSAGE-DRIVEN programming SYSTEM FOR FINE-GRAIN MULTIcomputerS

SOFTWARE-PRACTICE & EXPERIENCE 1994年第10期24卷 953-980页

作者： MASKIT, D TAYLOR, S Scalable Concurrent Programming Laboratory California Institute of Technology Computer Science Pasadena CA 91125 U.S.A.

This paper describes an experimental message-driven programming system for fine-grain multicomputers. The initial target architecture is the J-machine designed at MIT. This machine combines a unique collection of architectural features that include fine-grain processes, on-chip associative memory;and hardware support for process synchronization. The programming system uses these mechanisms via a simple message-driven process model that blurs the distinction between processes and messages: messages correspond to processes that are executed elsewhere in the network. This model allows code and data to be distributed across the computers in the machine, and is supported at every stage of the program development cycle. The prototype system we have developed includes a basic set of programming tools to support the model;these include a compiler, linker, archiver, loader and microkernel. Although the concepts are language independent, our prototype system is based on GNU-C.

关键词： FINE-GRAIN COMPUTING COMPILERS MESSAGE-PASSING CONCURRENCY MULTIcomputerS

来源：评论

学校读者我要写书评

暂无评论

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

引用

Journal of computer Science & Technology 2012年第2期27卷 240-255页

作者：徐新海杨学军薛京灵林宇斐林一松 National Laboratory for Parallel and Distributed Processing School of ComputerNational University of Defense Technology Programming Languages and Compilers Group School of Computer Science and Engineering University of New South Wales

GPGPUs are increasingly being used to as performance accelerators for HPC （High Performance Computing） applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world＇s fastest supercomputer in the TOP500 list, built at NUDT （National University of Defense Technology） last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT （single-instruction, multiple-thread） characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC （a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs） shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.

关键词： GPGPU partial recomputing fault tolerance CUDA checkpointing

来源：评论

学校读者我要写书评

暂无评论

Obtaining exact value by approximate computations

引用

Science China Mathematics 2007年第9期50卷 1361-1368页

作者： Jing-zhong ZHANG Yong FENG Laboratory for Automated Reasoning and Programming Chengdu Institute of Computer ApplicationsChinese Academy of SciencesChengdu 610041China

Numerical approximate computations can solve large and complex problems *** have the advantage of high *** they only give approximate results,whereas we need exact results in some *** is a gap between approximate computations and exact results. In this paper,we build a bridge by which exact results can be obtained by numerical approximate computations.

关键词： numerical approximate computation symbolic-numerical computation continued fraction 33F10

来源：评论

学校读者我要写书评

暂无评论

A complete discrimination system for polynomials

引用

Science China(Technological Sciences) 1996年第6期39卷 628-646页

作者：杨路侯晓荣曾振柄 Laboratory for Automated Reasoning & Programming Chengdu Institute of Computer ApplicationsChinese Academy of SciencesChengdu 610041China

Given a polynomial with symbolic/literal coefficients,a complete discrimination system is a set of explicit expressions in terms of the coefficients,which is sufficient for determining the numbers and multiplicities of the real and imaginary *** it is of great significance,such a criterion for root-classification has never been given for polynomials with degrees greater than *** lack of efficient tools in this aspect extremely prevents computer implementations for Tarski’s and other methods in automated theorem *** remedy this defect,a generic algorithm is proposed to produce a complete discrimination system for a polynomial with any *** result has extensive applications in various fields,and its efficiency was demonstrated by computer implementations.

关键词： discriminant sequence revised sign list root-classification complete discrimination system.

来源：评论

学校读者我要写书评

暂无评论

Optimizations and Deoptimizations for Escape Analysis in Open World

引用

电子学报(英文版) 2010年第2期19卷 211-216页

作者： SHI Xiaohua WU Gansha JIN Maozhong LUEH Guei-Yuan School of Computer Science Beihang University Beijing China Programming System Laboratory Microprocessor Technology Labs Intel Corporation Beijing China

This paper introduced the optimization and deoptimization technologies for Escape analysis in open world. These technologies are used in a novel Escape analysis framework that has been implemented in Open runtime platform, Intel's opensource Java virtual machine. We introduced the optimization technologies for synchronization removal and object stack allocation, as well as the runtime deoptimization and compensation work. The deoptimization and compensation technologies are crucial for a practical Escape analysis in open world. We evaluated the runtime efficiency of the deoptimization and compensation work on benchmarks like SPECjbb2000 and SPECjvm98.

关键词：优化技术世界逃生 Java虚拟机运行平台补偿工作开放源码补偿技术

来源：评论

学校读者我要写书评

暂无评论

A general symbolic PDE solver generator: Beyond explicit schemes

引用

Scientific programming 2003年第3期11卷 225-235页

作者： Sheshadri, K. Fritzson, Peter Programming Environment Laboratory Department of Computer Science Linköping University S-581 83 Linköping Sweden

This paper presents an extension of our Mathematica- and MathCode-based symbolic-numeric framework for solving a variety of partial differential equation (PDE) problems. The main features of our earlier work, which implemented explicit finite-difference schemes, include the ability to handle (1) arbitrary number of dependent variables, (2) arbitrary dimensionality, and (3) arbitrary geometry, as well as (4) developing finite-difference schemes to any desired order of approximation. In the present paper, extensions of this framework to implicit schemes and the method of lines are discussed. While C++ code is generated, using the MathCode system for the implicit method, Modelica code is generated for the method of lines. The latter provides a preliminary PDE support for the Modelica language. Examples illustrating the various aspects of the solver generator are presented.

关键词： computer programming

来源：评论

学校读者我要写书评

暂无评论

Power management of extreme-scale networks with on/off links in runtime systems

引用

ACM Transactions on Parallel Computing 2015年第2期1卷 1–21页

作者： Totoni, Ehsan Jain, Nikhil Kale, Laxmikant V. Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign UrbanaIL61801 United States

Networks are among major power consumers in large-scale parallel systems. During execution of common parallel applications, a sizeable fraction of the links in the high-radix interconnects are either never used or are underutilized. We propose a runtime system based adaptive approach to turn off unused links, which has various advantages over the previously proposed hardware and compiler based approaches. We discuss why the runtime system is the best system component to accomplish this task, and test the effectiveness of our approach using real applications (including NAMD, MILC), and application benchmarks (including NAS Parallel Benchmarks, Stencil). These codes are simulated on representative topologies such as 6-D Torus and multilevel directly connected network (similar to IBM PERCS in Power 775 and Dragonfly in Cray Aries). For common applications with near-neighbor communication pattern, our approach can save up to 20% of total machine's power and energy, without any performance penalty. © 2015 ACM.

关键词： Topology

来源：评论

学校读者我要写书评

暂无评论

Spectral properties and geometric interpretation of R-filters

引用

Applied Mathematics and Mechanics(English Edition) 2009年第1期30卷 109-120页

作者：冷拓 Laboratory for Automated Reasoning and Programming Chengdu Institute of Computer Applications Chinese Academy of SciencesChengdu 610041 P. R. China

By applying the Fourier analysis, we study the spectral properties of R- filters. Further, we prove that R-filters are a generalization of least squares polynomial adjustment, and we give the geometric interpretation ... 详细信息

关键词： R-filter HP-filter spectral properties Fourier analysis Hess-matrix operator theory

来源：评论

学校读者我要写书评

暂无评论

COMPOSITIONAL PRIORITY SPECIFICATION IN REAL-TIME DISTRIBUTED SYSTEMS

引用

SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES 1992年第1期17卷 75-93页

作者： SHYAMASUNDAR, RK LIU, LY Computer Science Group Tata Institute of Fundamental Research Bombay India IBM Programming Systems Cary Laboratory Cary USA

In this paper, we develop a compositional denotational semantics for prioritized real-time distributed programming languages. One of the interesting features is that it extends the existing compositional theory proposed by Koymans et al (1988) for prioritized real-time languages preserving the compositionality of the semantics. The language permits users to define situations in which an action has priority over another action without the requirement of preassigning priorities to actions for partially ordering the alphabet of actions. These features are part of the languages such as Ada designed specifically keeping in view the needs of real-time embedded systems. Further, the approach does not have the restriction of other approaches such as prioritized internal moves can preempt unprioritized actions etc. Our notion of priority in the environment is based on the intuition that a low priority action can proceed only if the high priority action cannot proceed due to lack of the handshaking partner at that point of execution. In other words, if some action is possible corresponding to that environment at some point of execution then the action takes place without unnecessary waiting. The proposed semantic theory provides a clear distinction between the semantic model and the execution model - this has enabled us to fully ensure that there is no unnecessary waiting.

关键词： COMPOSITIONAL SPECIFICATION REAL-TIME DISTRIBUTED SYSTEMS PRIORITY SPECIFICATION MESSAGE PASSING MODELS

来源：评论

学校读者我要写书评

暂无评论

PARBLO:Page-Allocation-Based DRAM Row Buffer Locality Optimization

引用

Journal of computer Science & Technology 2009年第6期24卷 1086-1097页

作者：米伟冯晓兵贾耀仓陈莉薛京灵 Key Laboratory of Computer System and Architecture Institution of Computing Technology Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences Programming Languages and Compilers Group School of Computer Science and Engineering University of New South Wales

DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% （with an average of 37.4%）. This reduction in row buffer miss rates will be translated into performance speedups by up to 15% （with an average of 5%）.

关键词： DRAM row buffer page allocation locality optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：