The problem of real-time leader election in a shared memory environment requires a single processor to be distinguished as the leader and requires an upper bound on the duration for which no leader is present. This pr...
详细信息
The problem of real-time leader election in a shared memory environment requires a single processor to be distinguished as the leader and requires an upper bound on the duration for which no leader is present. This processor can be used to provide services that must be continuously available. We propose an improved protocol, which requires O(log N) time and O(N/log N) variables.
This paper presents the process, strategy, and results associated with porting a typical combustion physics flow solver to current state-of-the-art and future massively-parallel computer architectures. Major focus is ...
详细信息
This paper presents the process, strategy, and results associated with porting a typical combustion physics flow solver to current state-of-the-art and future massively-parallel computer architectures. Major focus is placed on the distinct algorithmic structure of these types of codes and how it can be integrated with modern programming paradigms for heterogeneous platforms (i.e., distributed many-core systems with accelerators). An end-to-end case study is presented that exemplifies the process in a generic manner, which then serves as a clear guide with respect to the strategy and best practices leading to a robust and adaptable framework that performs well, is durable over time, is portable, and requires minimal human-effort. This end is accom-plished beginning with the use of a mature, validated, structured, multiblock code framework optimized for application of both Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS). This code has been ported to a variety of platforms over the past decade, including most recently the Oak Ridge Leader-ship computing Facility's "Summit" Platform. The experience gained on these multiple platforms provides general insights and thus the results presented are not specific to any one code or platform other than the overarching trend toward distributed many-core systems with accelerators in order to move toward exascale performance. The resultant performance and scalability of the ported code is demonstrated on a real-world application;a state-of-the-art rotating detonation rocket engine simulation that matches the complex geom-etry and boundary conditions imposed as part of a companion experimental campaign.
This paper investigates the problem of finding an optimal static pessimistic replica control scheme. It has been widely accepted that coteries (proposed by Garcia-Molina and Barbara) provide the most general framework...
详细信息
This paper investigates the problem of finding an optimal static pessimistic replica control scheme. It has been widely accepted that coteries (proposed by Garcia-Molina and Barbara) provide the most general framework for such schemes. We demonstrate that voting schemes, a very small subset of static pessimistic schemes, are optimal for fully connected networks with negligible link failure rates, as well as for Ethernet systems. We also show that voting is not optimal for somewhat more general systems. We propose a modification of the algorithm of Tong and Kain for computing optimal voting in operation independent case, so that it runs in linear (rather than exponential) time. Finally, we propose the first efficient algorithm for computing the optimal vote assignment and appropriate thresholds for fully connected networks when relative frequencies of read and write operations are known. We also extend this result to Ethernet systems.
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective,...
详细信息
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain tasks graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption.
An adaptive program is one that changes its behavior based on the current state of its environment. This notion of adaptivity is formalized and a logic for reasoning about adaptive programs is presented. The logic inc...
详细信息
An adaptive program is one that changes its behavior based on the current state of its environment. This notion of adaptivity is formalized and a logic for reasoning about adaptive programs is presented. The logic includes several composition operators that can be used to define an adaptive program in terms of given constituent programs;programs resulting from these compositions retain the adaptive properties of their constituent programs.
A novel time-frequency technique for linear frequency modulated (LFM) signal detection is proposed, The design of the proposed detectors is based on the Radon transform of the modulus square or the envelope amplitude ...
详细信息
A novel time-frequency technique for linear frequency modulated (LFM) signal detection is proposed, The design of the proposed detectors is based on the Radon transform of the modulus square or the envelope amplitude of the ambiguity function (AF) of the signal. A practical assumption is made that the chirp rate is the only parameter of interest, Since the AF of LFM signals will pass through the origin of the ambiguity plane, the line integral of the Radon transform is performed over all lines passing through the origin of the ambiguity plane, The proposed detectors yield maxima over chirp rates of the LFM signals, This reduces the two-dimensional (2-D) problem of the conventional Wigner-Ville distribution (WVD) based detection or the Radon-Wigner transform (RWT) based detector to a one-dimensional (I-D) problem and consequently reduces the computation load and keeps the feature of "built-in" filtering, Related issues such as the finite-length effect, the resolution, and the effect of noise are studied, The result is a tool for LFM detection, as well as the time-varying filtering and adaptive kernel design for multicomponent LFM signals.
Model reduction of second-order form linear systems is considered where a second-order form reduced model is desired, The focus is on reduction methods that employ or mimic Moore's balance and truncate. First, we ...
详细信息
Model reduction of second-order form linear systems is considered where a second-order form reduced model is desired, The focus is on reduction methods that employ or mimic Moore's balance and truncate. First, we examine second-order form model reduction by conversion to first-order form and obtain a complete solutiopgn for this problem, Then, new Gramians and input/output (I/O) invariants for second-order systems are motivated and defined, Based on these, direct second-order balancing methods are developed, This leads naturally to direct second-order form analogs for the web-known first-order form balance and truncate model reduction method. Explicit algorithms are given throughout the paper.
We investigate the dynamics of a gossip-like process for information dissemination in complex computer networks. We perform large-scale Monte Carlo simulations of this process on top of a scale-free network topology, ...
详细信息
We investigate the dynamics of a gossip-like process for information dissemination in complex computer networks. We perform large-scale Monte Carlo simulations of this process on top of a scale-free network topology, as a prototype model of networks with strongly heterogeneous degree distributions, and compare the results with simulations performed for random graphs, which have a homogeneous degree distribution. In addition to the above static networks, we also investigate the spreading process on time-dependent networks created by mobile wireless nodes (mobile adhoc networks). Our study provides new insights on how the dissemination dynamics is affected by the complex interplay between network structure, mobility and the spreading process. Our results are also relevant to other complex networks where gossip-like information dissemination takes place.
Two iterated algorithms for evaluating the performance of a class of sequential tests are proposed. The goal is equivalent to computing the distribution function of the first passage time for a random walk to cross a ...
详细信息
Two iterated algorithms for evaluating the performance of a class of sequential tests are proposed. The goal is equivalent to computing the distribution function of the first passage time for a random walk to cross a one-sided barrier. Limitations on both algorithms are studied, and associated methods for eliminating those limitations when possible are derived. These algorithms are applied to a pseudonoise code acquisition system and a range-sampled radar searching problem. Related computational problems are discussed, and numerical results are given.< >
The earth is modeled as inhomogeneous conducting medium with multiple horizontal layers. For known layer conductivity and thickness, the voltage distribution and grounding resistance have been computed efficiently usi...
详细信息
The earth is modeled as inhomogeneous conducting medium with multiple horizontal layers. For known layer conductivity and thickness, the voltage distribution and grounding resistance have been computed efficiently using an equivalent image method. In this paper, the reverse sequence is proposed, Without explicitly knowing the layer conductivities, thickness and number of layers, the equivalent images are determined by the voltage measurement results along the earth surface. Then these images are linearly translated to compute the underground voltage profiles and grounding resistance of buried electrodes. Such surface measurements are simple and inexpensive, and should give very accurate results based on the experience of applying these images in the microwave area.
暂无评论