检索结果-内蒙古大学图书馆

7th IFIP International Working Conference on Dependable computing for Critical Applications, DCCA 1999

作者： Sabnis, Chetan Cukier, Michel Ren, Jennifer Rubel, Paul Sanders, William H. Bakken, David E. Karr, David A. Center for Reliable and High-Performance Computing Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign UrbanaIL61801 United States BBN Technologies CambridgeMA02138 United States

ISBN: (纸本)0769502849

Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach;its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects, providing a high-level method for applications to specify their desired dependability, and providing a dependability manager that attempts to reconfigure a system at runtime so that dependability requests are satisfied. This paper describes how dependability is provided in AQuA. In particular it describes Proteus, the part of AQuA that dynamically manages replicated distributed objects to make them dependable. Given a dependability request, Proteus chooses a fault tolerance approach and reconfigures the system to try to meet the request. The infrastructure of Proteus is described in this paper, along with its use in implementing active replication and a simple dependability policy. © 1999 IEEE.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

Simulative performance analysis of gossip failure detection for scalable distributed systems

引用

Cluster computing 1999年第3期2卷 207-217页

作者： Burns, Mark W. George, Alan D. Wallace, Bradley A. High-performance Computing and Simulation (HCS) Research Laboratory Department of Electrical and Computer Engineering University of Florida Gainesville USA

Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which failures can be detected in large distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. The hierarchical protocol leverages the underlying network topology to achieve faster failure detection. In addition to studying the effectiveness and efficiency of these two agreement protocols, we propose a third protocol that extends the hierarchical approach by piggybacking gossip information on application-generated messages. The protocols are simulated and evaluated with a fault-injection model for scalable distributed systems comprised of clusters of workstations connected by high-performance networks, such as the CPlant system at Sandia National Laboratories. The model supports permanent and transient node and link failures, with rates specified at simulation time, for processors functioning in a fail-silent fashion. Through high-fidelity, CAD-based modeling and simulation, we demonstrate the strengths and weaknesses of each approach in terms of agreement time, number of gossips, and overall scalability.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Integrated frameworks for multi-level and multi-formalism modeling

Integrated frameworks for multi-level and multi-formalism mo...

引用

International Workshop on Petri Nets and performance Models (PNPM)

作者： W.H. Sanders Center for Reliable and High-Performance Computing Coordinated Science Laboratory and Department of Electrical and Computer Engineering University of Illinois Urbana IL USA

There have been significant advances in methods for specifying and solving models that aim to predict the performance and dependability of computer systems and networks. At the same time, however, there have been dramatic increases in the complexity of the systems whose performance and dependability must be evaluated, and considerable increases in the expectations of analysts that use performance/dependability evaluation tools. This paper briefly reviews the progress that has been made in the development of performance/dependability evaluation tools, and argues that the next important step is the creation of modeling frameworks and software environments that support multi-level, multi-formalism modeling and multiple solution methods within a single integrated framework. In addition, this paper presents an overview of the Mobius project, which aims to provide a modeling framework and software environment that support multiple modeling formalisms, methods for model composition and connection, and a way to integrate multiple analytical/numerical- and simulation-based model solution methods. Finally, it suggests research that must take place to make this aim a reality, and thus facilitate the performance and dependability evaluation of complex computer systems and networks.

关键词： Hardware Application software Software tools Stochastic processes Petri nets Contracts Satellites Costs Computational modeling Analytical models

来源：评论

学校读者我要写书评

暂无评论

Fault injection based on a partial view of the global state of a distributed system

Fault injection based on a partial view of the global state ...

引用

Reliable Distributed Systems

作者： M. Cukier R. Chandra D. Henke J. Pistole W.H. Sanders Center for Reliable and High-Performance Computing Coordinated Science Laboratory and Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign Urbana IL USA

This paper describes the basis for and preliminary implementation of a new fault injector, called Loki, developed specifically for distributed systems. Loki addresses issues related to injecting correlated faults in distributed systems. In Loki, fault injection is performed based on a partial view of the global state of an application. In particular, facilities are provided to pass user-specified state information between nodes to provide a partial view of the global state in order to try to inject complex faults successfully. A post-runtime analysis, using an off-line clock synchronization and a bounding technique, is used to place events and injections on a single global time-line and determine whether the intended faults were properly injected. Finally, observations containing successful fault injections are used to estimate specified dependability measures. In addition to describing the details of our new approach, we present experimental results obtained from a preliminary implementation in order to illustrate Loki's ability to inject complex faults predictably.

关键词： Control systems Contracts Distributed computing Protocols Visualization performance evaluation Writing

来源：评论

学校读者我要写书评

暂无评论

Accurately measuring MPI broadcasts in a computational grid

Accurately measuring MPI broadcasts in a computational grid

引用

International Symposium on high performance Distributed computing

作者： B.R. de Supinski N.T. Karonis Lawrence Livermore National Laboratory Center for Applied Scientific Computing Livermore CA USA High-Performance Computing Laboratory Department of Computer Science Northern Illinois University DeKalb IL USA

An MPI library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance, even in a challenging grid environment. Measuring broadcast performance is not easy. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in inaccurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. This problem becomes even more challenging in grid environments. Latencies along different links can vary significantly. Thus, an algorithm's performance is difficult to predict from it's communication pattern. Even when accurate prediction is possible, the pattern is often unknown. Our method introduces a measurable overhead to eliminate the pipelining effect, regardless of variations in link latencies.

关键词： Broadcasting Grid computing Libraries Timing Delay Laboratories Scientific computing Design methodology computer science Application software

来源：评论

学校读者我要写书评

暂无评论

Proteus: a flexible infrastructure to implement adaptive fault tolerance in AQuA

Proteus: a flexible infrastructure to implement adaptive fau...

引用

Dependable computing for Critical Applications 7

作者： C. Sabnis M. Cukier J. Ren P. Rubel W.H. Sanders D.E. Bakken D. Karr Center of Reliable and High-Performance Computing Coordinated Science Laboratory and Department of Electrical and Computer Engineering University of Illinois Urbana IL USA BBN Technologies GTE Cambridge MA USA

Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects, providing a high-level method for applications to specify their desired dependability, and providing a dependability manager that attempts to reconfigure a system at runtime so that dependability requests are satisfied. This paper describes how dependability is provided in AQuA. In particular it describes Proteus, the part of AQuA that dynamically manages replicated distributed objects to make them dependable. Given a dependability request, Proteus chooses a fault tolerance approach and reconfigures the system to try to meet the request. The infrastructure of Proteus is described in this paper, along with its use in implementing active replication and a simple dependability policy.

关键词： Fault tolerance Quality of service Buildings Application software Costs Fault tolerant systems Hardware Contracts Distributed computing high performance computing

来源：评论

学校读者我要写书评

暂无评论

Building dependable distributed applications using AQUA

Building dependable distributed applications using AQUA

引用

IEEE International Symposim on high Assurance Systems engineering

作者： J. Ren M. Cukier P. Rubel W.H. Sanders D.E. Bakken D.A. Karr Center for Reliable and High-Performance Computing Coordinated Science Laboratory and Department of Electrical and Computer Engineering University of Illinois Urbana IL USA BBN Technologies GTE Cambridge MA USA

Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting requirement of having to provide fault tolerance at the application level, in addition to dealing with the complexities of the distributed application itself. This approach requires a deep knowledge of fault tolerance on the part of the application designer, and has a high implementation cost. What is needed is a systematic approach to providing dependability to distributed applications. Proteus, part of the AQuA architecture, fills this need and provides facilities to make a standard distributed CORBA application dependable, with minimal changes to an application. Furthermore, it permits applications to specify, either directly or via the Quality Objects (QuO) infrastructure, the level of dependability they expect of a remote object, and will attempt to configure the system to achieve the requested dependability level. Our previous papers have focused on the architecture and implementation of Proteus. This paper describes how to construct dependable applications using the AQuA architecture, by describing the interface that a programmer is presented with and the graphical monitoring facilities that it provides.

关键词： Electrical capacitance tomography Fault tolerance Buildings Application software Quality of service Contracts Hardware Runtime Object detection Tellurium

来源：评论

学校读者我要写书评

暂无评论

Cots hardware and software in high-availability systems

Cots hardware and software in high-availability systems

引用

International Symposium on Fault-Tolerant computing (FTCS)

作者： R.K. Iyer A. Avizienis Center for Reliable & High-Performance Computing Coordinated Science Laboratory University of Illinois Urbana IL USA Computer Science Department University of California Los Angeles CA USA

来源：评论

学校读者我要写书评

暂无评论

Multivariate geographic clustering in a metacomputing environment using globus

Multivariate geographic clustering in a metacomputing enviro...

引用

1999 ACM/IEEE Conference on Supercomputing, SC 1999

作者： Mahinthakumar, G. Hoffman, Forrest M. Hargrove, William W. Karonis, Nicholas T. Oak Ridge National Laboratory Center for Computational Sciences P. O. Box 2008 Oak RidgeTN37831-6203 United States University of Tennessee Energy Environment Resources Center Systems Development Institute 10521 Research Drive KnoxvilleTN37932 United States High Performance Computing Laboratory Department of Computer Science Northern Illinois University DekalbIL60115 United States

ISBN: (纸本)1581130910

The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization;performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments;and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3™, an IBM SP2™, and two SGI Origin 2000™'s. The tests were performed with minimal code modification, and were made possible by Globus™ (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations. © 1999 IEEE.

关键词： Message passing

来源：评论

学校读者我要写书评

暂无评论

Utilization of matrix structure to generate optimized code from MATLAB programs

引用

International Journal of Parallel Programming 1999年第2期27卷 73-96页

作者： Marsolf, Bret A. Gallivan, Kyle A. Wijshoff, Harry A. G. DEMACO Inc. Champaign IL United States Compl. Sci. and Engineering Program Florida State University Tallahassee FL United States High Performance Computing Division Department of Computer Science Leiden University Leiden Netherlands

The FALCON development environment was designed around three basic data representations: scalars, vectors, and dense matrices. Utilizing the FALCON interactive restructuring system, the environment has been enhanced to allow the identification of structures within sparse matrices, such as diagonal matrices or symmetric matrices, and the use of this information for improving performance of the generated code. In addition, the environment supports the modification of the representation of the data. Such modifications have been shown to provide significant performance improvements.

关键词： computer programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：