In recent years, the research community has made great strides in alias annotations that support parallel programming [1]. Using these techniques, programmers no longer have to guess where aliased mutable state may ca...
详细信息
In this article, we propose a parallel computing method of 3-D finite-element analysis coupled with circuit equations for characteristic calculation of rotating machines. In the proposed method, the preconditioning pa...
详细信息
In this article, we propose a parallel computing method of 3-D finite-element analysis coupled with circuit equations for characteristic calculation of rotating machines. In the proposed method, the preconditioning part in the matrix solver is parallelized as well as the other part, in order to obtain the stable solution within short computational time. The proposed method is applied to the loss calculation of an interior permanent magnet synchronous motor fed by an inverter to clarify the advantages.
We present updates to the Cray Graph Engine, a high performance in-memory semantic graph database, which enable performant execution across multiple architectures as well as deployment in a container to support cloud ...
详细信息
We present updates to the Cray Graph Engine, a high performance in-memory semantic graph database, which enable performant execution across multiple architectures as well as deployment in a container to support cloud and as-a-service graph analytics. This paper discusses the changes required to port and optimize CGE to target multiple architectures, including Cray Shasta systems, large shared-memory machines such as SuperDome Flex (SDF), and cluster environments such as Apollo systems. The porting effort focused primarily on removing dependences on XPMEM and Cray PGAS and replacing these with a simplified PGAS library based upon POSIX shared memory and one-sided MPI, while preserving the existing Coarray-C++ CGE code base. We also discuss the containerization of CGE using Singularity and the techniques required to enable container performance matching native execution. We present early benchmarking results for running CGE on the SDF, Infiniband clusters and Slingshot interconnect-based Shasta systems.
With the growing amount of data, computational power has became highly required in all fields. To satisfy these requirements, the use of GPUs seems to be the appropriate solution. But one of their major setbacks is th...
详细信息
ISBN:
(纸本)9789897585883
With the growing amount of data, computational power has became highly required in all fields. To satisfy these requirements, the use of GPUs seems to be the appropriate solution. But one of their major setbacks is their varying architectures making writing efficient parallel code very challenging, due to the necessity to master the GPU's low-level design. CUDA offers more flexibility for the programmer to exploit the GPU's power with ease. However, tuning the launch parameters of its kernels such as block size remains a daunting task. This parameter requires a deep understanding of the architecture and the execution model to be well-tuned. Particularly, in the Viola-Jones algorithm, the block size is an important factor that improves the execution time, but this optimization aspect is not well explored. This paper aims to offer the first steps toward automatically tuning the block size for any input without having a deep knowledge of the hardware architecture, which ensures the automatic portability of the performance over different GPUs architectures. The main idea is to define techniques on how to get the optimum block size to achieve the best performance. We pointed out the impact of using static block size for all input sizes on the overall performance. In light of the findings, we presented two dynamic approaches to select the best block size suitable to the input size. The first one is based on an empirical search;this approach provides the optimal performance;however, it is tough for the programmer, and its deployment is time-consuming. In order to overcome this issue, we proposed a second approach, which is a model that automatically selects a block size. Experimental results show that this model can improve the execution time by up to 2.5x over the static approach.
Fully distributed intelligent building systems can be used to effectively reduce the complexity of building automation systems and improve the efficiency of the operation and maintenance management because of its self...
详细信息
Fully distributed intelligent building systems can be used to effectively reduce the complexity of building automation systems and improve the efficiency of the operation and maintenance management because of its self-organization, flexibility, and robustness. However, the parallel computing mode, dynamic network topology, and complex node interaction logic make application development complex, time-consuming, and challenging. To address the development difficulties of fully distributed intelligent building system applications, this paper proposes a user-friendly programming language called SwarmL. Concretely, SwarmL (1) establishes a language model, an overall framework, and an abstract syntax that intuitively describes the static physical objects and dynamic execution mechanisms of a fully distributed intelligent building system, (2) proposes a physical field-oriented variable that adapts the programming model to the distributed architectures by employing a serial programming style in accordance with human thinking to program parallel applications of fully distributed intelligent building systems for reducing programming difficulty, (3) designs a computational scope-based communication mechanism that separates the computational logic from the node interaction logic, thus adapting to dynamically changing network topologies and supporting the generalized development of the fully distributed intelligent building system applications, and (4) implements an integrated development tool that supports program editing and object code generation. To validate SwarmL, an example application of a real scenario and a subject-based experiment are explored. The results demonstrate that SwarmL can effectively reduce the programming difficulty and improve the development efficiency of fully distributed intelligent building system applications. SwarmL enables building users to quickly understand and master the development methods of application tasks in fully distributed intelligent
Effective and safe parallel programming is among the biggest challenges of today's software technology. The C++ 17 standard introduced parallel STL: a set of overloaded functions taking an additional 'executio...
详细信息
Heterogeneous architectures proved successful in achieving unprecedented performance and energy-efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code throug...
详细信息
Arithmetic coding (AC) is widely used for lossless data compression, and parallelization of arithmetic coding is relatively simple because all symbols can be encoded independently. On the other hand, parallel adaptive...
详细信息
Modern Internet of Things (IoT) end nodes must support computational intensive workloads at a limited power budget. parallel ultra-low-power (PULP) architectures are a promising target for this scenario, and the avail...
详细信息
Modern Internet of Things (IoT) end nodes must support computational intensive workloads at a limited power budget. parallel ultra-low-power (PULP) architectures are a promising target for this scenario, and the availability of highly optimized software libraries is crucial to exploit parallelism and reduce software development costs. This letter proposes an efficient parallel design of the widely used short-time Fourier transform (STFT) and discrete wavelet transform (DWT) transforms targeting ultra-low-power IoT devices. We address key performance challenges related to fine-grained synchronization and banking conflicts in shared memory. We achieve high throughput (50.95 samples/mu s, on average), good parallel speedup (up to 6.79x), and high energy efficiency (up to 172.55 GOp/s/W) on a cluster of eight RISC-V cores optimized for PULP operation.
Ebb-and-flow irrigation system is a closed-loop efficient subirrigation system. In this study, a numerical model (EBMAN-HP) has been presented for simulation of all components (variations of water depth in supply tank...
详细信息
Ebb-and-flow irrigation system is a closed-loop efficient subirrigation system. In this study, a numerical model (EBMAN-HP) has been presented for simulation of all components (variations of water depth in supply tank and concrete floor/tank) and all phases of flood-floor/bench ebb-and-flow subirrigation systems. The model benefits from a fine-tuned computational algorithm for hysteresis module. The model can simulate both time-specified and sensor-based irrigation scheduling. Since ebb-and-flow irrigation system incorporates numerous pots, Richards' equation should be solved for several pots to obtain sufficient understanding of the whole system. Therefore, the proposed model benefits from OpenMP parallel programming to speed up the execution time. Besides, a novel parallel TDMA solver have been presented that accelerates the computation speed by breaking a large system of equations into several simultaneously-solved portions. The model has been validated and verified against several analytical, numerical and experimental test cases. The results showed hysteresis module can completely remove artificial pumping error in two critical test cases. The parallel TDMA solver was shown to be able to reach to the speedup of about 90 %. The model was shown to perform faster than Hydrus-1D even in serial mode for coarser grids (about 52 % faster in average of 8 test cases) and similar to Hydrus-1D for dense grids (about 6 % faster in average of 4 test cases) with the perfect agreement (NSE between 0.999 and 1.000 and the average difference in MBE less than 0.1 % for 12 cases). parallel model could boost the models' performance to about 500 % using 6 processors. Finally, comprehensive illustrative example has been shown to present almost all capabilities of model.
暂无评论