In the Big Data computing, improving performance with memorycomputing is one of hot spots. In the memorycomputing, the data deployment directly affects load balance and task efficiency. In the scene of memory comput...
详细信息
ISBN:
(纸本)9781467391160
In the Big Data computing, improving performance with memorycomputing is one of hot spots. In the memorycomputing, the data deployment directly affects load balance and task efficiency. In the scene of memorycomputing of electric power data, two unsolved problems are: (1) only memory space, without the CPU frequency and nuclear number, could be considered for load balance and improving performance;(2) there are so many manual operations that it is difficult to complete data deployment automatically. This paper provides an electric power data deployment solution for distributed memory computing to solve the above challenges. In the solution, according to business logic and hardware configuration of cluster nodes, the data deployment strategy can be established. Then, the deployment scheme can be implemented with interface operation. Lastly, cluster nodes load data according to the deployment scheme. The solution has been applied to the Objectification Parallel computing (OPC). The application result shows that OPC can achieve the best performance which can meet the demand of system efficiency and the operation of data deployment is simple.
The Big Data computing is one of hot spots of the internet of things and cloud computing. How to compute efficiently on the Big Data is the key of improving performance. By means of distributedcomputing or memory com...
详细信息
ISBN:
(纸本)9781509018932
The Big Data computing is one of hot spots of the internet of things and cloud computing. How to compute efficiently on the Big Data is the key of improving performance. By means of distributedcomputing or memorycomputing, many companies and institutions provide some technologies and produces. But they are invalid in the scene in which there are real-time demands in the low-configure cluster. To deal with the problem, this paper provides a distributedcomputing and memorycomputing-based effective solution (Objectification Parallel computing, OPC). In the solution, the data can be formatted into object. Then the objects are distributed stored in the computer memories and parallel compute to complete tasks. The OPC is applied to the Electric Asset Quality Supervision Manage System (EAQSMS) of State Grid of China, the result shows that with PCs the system is efficiently available, reliable, and flexible expansible.
Parallel concepts for spectral wind-wave models are discussed, with a focus on the WAVE-WATCH III model which runs in a routine operational mode at NOAA/NCEP. After a brief description of relevant aspects of wave mode...
详细信息
Parallel concepts for spectral wind-wave models are discussed, with a focus on the WAVE-WATCH III model which runs in a routine operational mode at NOAA/NCEP. After a brief description of relevant aspects of wave models, basic parallelization concepts are discussed. It is argued that a method including data transposes is more suitable for this model than conventional domain decomposition techniques, Details of the implementation, including specific buffering techniques for the data to be communicated between processors, are discussed. Extensive timing results are presented for up to 450 processors on an IBM RS6000 SP. The resulting model is shown to exhibit excellent parallel behavior for a large range of numbers of processors. (C) 2002 Elsevier Science B.V. All rights reserved.
The practicality of Large-eddy simulation (LES) of turbulent combustion, as is found in gas turbine engines, on clusters of commodity PC-based symmetric multi-processor (SMP) systems in 2-, 4-, and 8-way configuration...
详细信息
The practicality of Large-eddy simulation (LES) of turbulent combustion, as is found in gas turbine engines, on clusters of commodity PC-based symmetric multi-processor (SMP) systems in 2-, 4-, and 8-way configurations has been investigated. Bandwidth demands from both memory and networking in the benchmark LES algorithm are shown to the primary performance inhibitors. Contention in the various SMP architectures tested is shown to compound these two hardware limitations. To investigate the ability of the parallel clustered systems, low-level hardware studies are conducted in conjunction with bench-marking of the LES application. The hardware tests focus on memory and communication contention under loads found in the LES algorithm. For comparison, the benchmarks are also applied to two industry leading high-performance super-computing architectures. It is found that contention in the 4- and 8-way SNIP architecture studied here limits their applicability while the 2-way systems shows competitive performance and speed-up compared to its industry counterparts. It is concluded that design-level combustion LES on clusters of commodity hardware, when equipped with sufficient memory and communication bandwidth. are a viable substitute for more expensive super-computing platforms. (C) 2004 Elsevier Inc. All rights reserved.
In the Aurora distributed shared data system, the programmer instantiates shared-data objects and uses scoped behavior to incrementally tune applications on a per-object and per-context basis. A class library implemen...
详细信息
In the Aurora distributed shared data system, the programmer instantiates shared-data objects and uses scoped behavior to incrementally tune applications on a per-object and per-context basis. A class library implements shared-data objects as abstract data types and scoped behavior implements the optimizations within standard C++. Using a network of workstations connected by an ATM switch, the author demonstrates that Aurora performs comparably to message passing
MFiX, a general-purpose Fortran-based suite, simulates the complex flow in fluidized bed applications via BiCGStab and GMRES methods along with plane relaxation preconditioners. Trilinos, an object-oriented framework,...
详细信息
MFiX, a general-purpose Fortran-based suite, simulates the complex flow in fluidized bed applications via BiCGStab and GMRES methods along with plane relaxation preconditioners. Trilinos, an object-oriented framework, contains various first- and second-generation Krylov subspace solvers and preconditioners. We developed a framework to integrate MFiX with Trilinos as MFiX does not possess advanced linear methods. The framework allows MFiX to access advanced linear solvers and preconditioners in Trilinos. The integrated solver is called MFiX-Trilinos, here after. In the present work, we study the performance of variants of GMRES and CGS methods in MFiX-Trilinos and BiCGStab and GMRES solvers in MFiX for a 3D gas-solid fluidized bed problem. Two right preconditioners employed along with various solvers in MFiX-Trilinos are Jacobi and smoothed aggregation. The flow from MFiX-Trilinos is validated against the same from MFiX for BiCGStab and GMRES methods. And, the effect of the preconditioning on the iterative solvers in MFiX-Trilinos is also analyzed. In addition, the effect of left and right smoothed aggregation preconditioning on the solvers is studied. The performance of the first- and second-generation solver stacks in MFiX-Trilinos is studied as well for two different problem sizes.
Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two- sid...
详细信息
Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two- sided Trmm. We start with a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and add only a few transformations to accommodate the more complex two- sided Trmm. These additions explode the search space of our prototype system, DxTer, requiring the novel techniques defined in this paper to eliminate large segments of the search space that contain suboptimal algorithms. Performance results for the mechanically optimized implementations on 8192 cores of a BlueGene/P architecture are given.
暂无评论