ASC (Associative Computing Model) and MASC (Multiple Associative Computing Model) have long been studied in the Department of Computer Science at Kent State University While the previous studies provide the background...
详细信息
ISBN:
(纸本)0769523129
ASC (Associative Computing Model) and MASC (Multiple Associative Computing Model) have long been studied in the Department of Computer Science at Kent State University While the previous studies provide the background and the basic definition of the model, the description of the interactions between the instruction streams (ISs) is very brief, high level, and incomplete. One change here is that we specify the interaction between ISs and consider that all of the ISs operate on the same clock in order to support predictable worst case computation times, while earlier the ISs were assumed to interact in a MIMD type fashion. This paper provides a detailed explanation as to how these interactions can be supported in the case where only a few ISs are supported.
Stabilized explicit implicit domain decomposition (SEIDD) is a class of globally non-iterative domain decomposition methods for the numerical simulation of unsteady diffusion processes on parallel computers. By adding...
详细信息
ISBN:
(纸本)0769523129
Stabilized explicit implicit domain decomposition (SEIDD) is a class of globally non-iterative domain decomposition methods for the numerical simulation of unsteady diffusion processes on parallel computers. By adding a communication-cost-free stabilization step to the explicit-implicit domain decomposition (EIDD) methods, the SEIDD methods achieve high stability but with the restriction that the interface boundaries have no crossing-overs inside the domain. In this paper, we present a parallelized SEIDD algorithm with paralellism higher than the number of subdomains, eliminating the disadvantage of non-crossing-over interface boundaries at a slight computation cost.
As high-performance computing increases in popularity and performance, the demand for similarly capable input and output systems rises. parallel I/O takes advantage of many data server machines to provide linearly sca...
详细信息
ISBN:
(纸本)0769523129
As high-performance computing increases in popularity and performance, the demand for similarly capable input and output systems rises. parallel I/O takes advantage of many data server machines to provide linearly scaling performance to parallel applications that access storage over the system area network. The demands placed on the network by a parallel storage system are considerably different than those imposed by message-passing algorithms or data-center operations;and, there are many popular and varied networks in use in modern parallel machines. These considerations lead us to develop a network abstraction layer for parallel I/O which is efficient and thread-safe, provides operations specifically required for I/O processing, and supports multiple networks. The Buffered Message Interface (BMI) has low processor overhead, minimal impact on latency, and can improve throughput for parallel file system workloads by as much as 40% compared to other more generic network abstractions.
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or...
详细信息
ISBN:
(纸本)0769523129
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. We have previously defined recovery schemes that are optimal for some limited cases. In this paper we find a new recovery schemes that are based on so called Golomb rulers. They are optimal for a much larger number of cases than the previous results.
In this paper, we propose, implement and analyze the performance of a Reconfigurable Sequential Consistency Algorithm (RSCA) using simulation. Extending the concepts of reconfigurable devices to the algorithmic level,...
详细信息
ISBN:
(纸本)0769523129
In this paper, we propose, implement and analyze the performance of a Reconfigurable Sequential Consistency Algorithm (RSCA) using simulation. Extending the concepts of reconfigurable devices to the algorithmic level, we model RSCA that is a reconfigurable sequential consistency algorithm for asynchronous distributed systems that manage concurrent objects stating. As our main results, we present that, on average, the performance of RSCA was 36% better than the traditional sequential consistency algorithms. The main contributions of this paper are: the definition, proposal, implementation and performance analysis of RSCA.
When distributing data across several nodes, two different approaches exist. The first one consists in distribution of the data object itself, e.g. in striping. The second approach is aggregation of local storages, wh...
详细信息
ISBN:
(纸本)0769523129
When distributing data across several nodes, two different approaches exist. The first one consists in distribution of the data object itself, e.g. in striping. The second approach is aggregation of local storages, whereby each data object is assigned to a home storage node. From the viewpoint of fault-tolerant data layouts, these schemes seem to be similar. In both cases the addition of parity, e.g. RAID level 3, level 5 or Reed-Solomon codes provide tolerance against node failures. A closer look shows differences in reachable access rates, needed number of messages and recovery cost. In this paper we compare both approaches and provide a method for self reconfiguration. The transformation from a parity grouping layout to a striping layout is shown to be feasible for stepwise and concurrent operation during data access.
An efficient and scalable Discontinuous Galerkin shallow water model on the cubed sphere is developed by extending the transport scheme of Nair et al. [16], The continuous flux form nonlinear shallow water equations i...
详细信息
ISBN:
(纸本)0769523129
An efficient and scalable Discontinuous Galerkin shallow water model on the cubed sphere is developed by extending the transport scheme of Nair et al. [16], The continuous flux form nonlinear shallow water equations in curvilinear coordinates are developed. Spatial discretization is a nodal basis set of Legendre polynomials. Fluxes along internal element interfaces are approximated by a Lax-Friedrichs scheme. A third-order total variation diminishing Runge-Kutta scheme is applied for time integration, without any filter or limiter. The standard shallow-water test suite of Williamson et al. [23] is used to validate the model. It is observed that the numerical solutions are accurate, the model conserves mass to machine precision, and there are no spurious oscillations in a test case where zonal flow impinges a mountain,. Development time was substantially reduced by building the model in the High Order Method Modeling Environment (HOMME) developed at the National Center for Atmospheric Research (NCAR). Performance and scaling data for the steady state geostrophic flow problem [23] is presented. Sustained performance in excess of 10% of peak is observed out to 64 processors on a Linux cluster.
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program an...
详细信息
ISBN:
(纸本)0769523129
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program any other compatible device. This tool is the only one known academic implementation for FPGA configuration with such features. Among them are the run-time-, partial- and dynamic-reconfiguration, the memory management, the bitstream compression and encryption, the read-back technique, the bitstream reallocation, the used low-power techniques as well as the Graphical User Interface.
Dynamic reconfiguration is a promising approach for resource efficient utilization of microelectronic systems. However, work on general approaches to model reconfigurable hardware is quite rare. Therefore, we have dev...
详细信息
ISBN:
(纸本)0769523129
Dynamic reconfiguration is a promising approach for resource efficient utilization of microelectronic systems. However, work on general approaches to model reconfigurable hardware is quite rare. Therefore, we have developed a new analytical model, which can be used for the analysis of the various approaches to dynamic reconfiguration. Based on our model we define metrics for partial reconfiguration to evaluate different system approaches. Our main objective is to analyze placement algorithms for partially reconfigurable architectures. The model is able to consider miscellaneous constraints and cost parameters for online placement. Thus, placement can be adapted to dynamically changing system environments.
Most of current peer-to-peer designs build their own system overlays independent of the physical one. Nodes within unstructured systems form a random overlay, on the contrary, structured designs normally organize peer...
详细信息
ISBN:
(纸本)0769523129
Most of current peer-to-peer designs build their own system overlays independent of the physical one. Nodes within unstructured systems form a random overlay, on the contrary, structured designs normally organize peers into an elegant identifier ring. However, all of those overlays are far from the physical one. Noticed that the system overlay is crucial for building a distributed system, this paper proposes to build system overlays based on the physical overlay. By making full use of physical network characteristics and taking advantages of both structured and unstructured protocols, a network-based peer-to-peer system is built in this paper. Not only the system is highly efficient (the stretch is equal to one), but also it can adapt extremely system churning. The most important is that the maintenance overhead is very low, even under highly dynamic environment.
暂无评论