This paper presents the process, strategy, and results associated with porting a typical combustion physics flow solver to current state-of-the-art and future massively-parallel computer architectures. Major focus is ...
详细信息
This paper presents the process, strategy, and results associated with porting a typical combustion physics flow solver to current state-of-the-art and future massively-parallel computer architectures. Major focus is placed on the distinct algorithmic structure of these types of codes and how it can be integrated with modern programming paradigms for heterogeneous platforms (i.e., distributed many-core systems with accelerators). An end-to-end case study is presented that exemplifies the process in a generic manner, which then serves as a clear guide with respect to the strategy and best practices leading to a robust and adaptable framework that performs well, is durable over time, is portable, and requires minimal human-effort. This end is accom-plished beginning with the use of a mature, validated, structured, multiblock code framework optimized for application of both Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS). This code has been ported to a variety of platforms over the past decade, including most recently the Oak Ridge Leader-ship computing Facility's "Summit" Platform. The experience gained on these multiple platforms provides general insights and thus the results presented are not specific to any one code or platform other than the overarching trend toward distributed many-core systems with accelerators in order to move toward exascale performance. The resultant performance and scalability of the ported code is demonstrated on a real-world application;a state-of-the-art rotating detonation rocket engine simulation that matches the complex geom-etry and boundary conditions imposed as part of a companion experimental campaign.
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective,...
详细信息
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain tasks graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption.
With growing interest in distributed computing come demands for techniques to aid in development of correct and reliable distributed software. Controlling, or at least recognizing, complexity of such software is an im...
详细信息
With growing interest in distributed computing come demands for techniques to aid in development of correct and reliable distributed software. Controlling, or at least recognizing, complexity of such software is an important part of the development and maintenance process. While a number of metrics have been proposed for quantitatively measuring the complexity of sequential, centralized programs, corresponding metrics for distributed software are noticeable by their absence. Using Ada as a representative distributed programming language, this paper discusses some ideas on complexity metrics that focus on Ada tasking and rendezvous. Concurrently active rendezvous are claimed to be an important aspect of communication complexity. A Petri net graph model of Ada rendezvous is used to introduce a 'rendezvous graph,' an abstraction that can be useful in viewing and computing effective communication complexity. [ABSTRACT FROM AUTHOR]
The earth is modeled as inhomogeneous conducting medium with multiple horizontal layers. For known layer conductivity and thickness, the voltage distribution and grounding resistance have been computed efficiently usi...
详细信息
The earth is modeled as inhomogeneous conducting medium with multiple horizontal layers. For known layer conductivity and thickness, the voltage distribution and grounding resistance have been computed efficiently using an equivalent image method. In this paper, the reverse sequence is proposed, Without explicitly knowing the layer conductivities, thickness and number of layers, the equivalent images are determined by the voltage measurement results along the earth surface. Then these images are linearly translated to compute the underground voltage profiles and grounding resistance of buried electrodes. Such surface measurements are simple and inexpensive, and should give very accurate results based on the experience of applying these images in the microwave area.
Two iterated algorithms for evaluating the performance of a class of sequential tests are proposed. The goal is equivalent to computing the distribution function of the first passage time for a random walk to cross a ...
详细信息
Two iterated algorithms for evaluating the performance of a class of sequential tests are proposed. The goal is equivalent to computing the distribution function of the first passage time for a random walk to cross a one-sided barrier. Limitations on both algorithms are studied, and associated methods for eliminating those limitations when possible are derived. These algorithms are applied to a pseudonoise code acquisition system and a range-sampled radar searching problem. Related computational problems are discussed, and numerical results are given.< >
A novel time-frequency technique for linear frequency modulated (LFM) signal detection is proposed, The design of the proposed detectors is based on the Radon transform of the modulus square or the envelope amplitude ...
详细信息
A novel time-frequency technique for linear frequency modulated (LFM) signal detection is proposed, The design of the proposed detectors is based on the Radon transform of the modulus square or the envelope amplitude of the ambiguity function (AF) of the signal. A practical assumption is made that the chirp rate is the only parameter of interest, Since the AF of LFM signals will pass through the origin of the ambiguity plane, the line integral of the Radon transform is performed over all lines passing through the origin of the ambiguity plane, The proposed detectors yield maxima over chirp rates of the LFM signals, This reduces the two-dimensional (2-D) problem of the conventional Wigner-Ville distribution (WVD) based detection or the Radon-Wigner transform (RWT) based detector to a one-dimensional (I-D) problem and consequently reduces the computation load and keeps the feature of "built-in" filtering, Related issues such as the finite-length effect, the resolution, and the effect of noise are studied, The result is a tool for LFM detection, as well as the time-varying filtering and adaptive kernel design for multicomponent LFM signals.
Model reduction of second-order form linear systems is considered where a second-order form reduced model is desired, The focus is on reduction methods that employ or mimic Moore's balance and truncate. First, we ...
详细信息
Model reduction of second-order form linear systems is considered where a second-order form reduced model is desired, The focus is on reduction methods that employ or mimic Moore's balance and truncate. First, we examine second-order form model reduction by conversion to first-order form and obtain a complete solutiopgn for this problem, Then, new Gramians and input/output (I/O) invariants for second-order systems are motivated and defined, Based on these, direct second-order balancing methods are developed, This leads naturally to direct second-order form analogs for the web-known first-order form balance and truncate model reduction method. Explicit algorithms are given throughout the paper.
An adaptive program is one that changes its behavior based on the current state of its environment. This notion of adaptivity is formalized and a logic for reasoning about adaptive programs is presented. The logic inc...
详细信息
An adaptive program is one that changes its behavior based on the current state of its environment. This notion of adaptivity is formalized and a logic for reasoning about adaptive programs is presented. The logic includes several composition operators that can be used to define an adaptive program in terms of given constituent programs;programs resulting from these compositions retain the adaptive properties of their constituent programs.
An important requirement in surveying for residual radioactivity is the detection of localized areas of elevated contamination, sometimes referred to as hot spots. In the present work we have developed a computer code...
详细信息
An important requirement in surveying for residual radioactivity is the detection of localized areas of elevated contamination, sometimes referred to as hot spots. In the present work we have developed a computer code that searches for distributions of surface activity (possibly many) that are consistent with a series of in situ measurements on a grid indicating the possible presence of hot spots. The algorithm makes use of a maximum entropy deconvolution of the data, followed by further analysis. The algorithm is quite general and could be modified for use in other types of measurements. Properties of the algorithm are demonstrated using data from actual field measurements.
Flammia offers a new perspective on peer-to-peer computing. He looks at connected computers from the network perspective instead of that of the individual computers. A network of connected computers provides an enviro...
详细信息
Flammia offers a new perspective on peer-to-peer computing. He looks at connected computers from the network perspective instead of that of the individual computers. A network of connected computers provides an environment for distributed applications that cannot be foreseen using the analogy of desktop applications.
暂无评论