In this paper, we present our learning experience on the design and implementation of image dehazing parallel code with OpenMP developed from existing fast sequential version. The aim of this work is to present an ana...
详细信息
In this paper, we present our learning experience on the design and implementation of image dehazing parallel code with OpenMP developed from existing fast sequential version. The aim of this work is to present an analysis of a case study showing the development of parallel haze removal with practical and efficient use of shared memory multi-core servers. Implementation technique and result discussions in terms of program improvements that may be needed to support parallel application developers with similar high performance goals are presented. Preliminary studies, results and experiments on haze removal application program are executed on multi-core shared memory platforms, and results show that the performance of the proposed parallel code is promising.
We introduce a generic analytic simulation and image reconstruction software platform for multi-pinhole (MPH) SPECT systems. The platform is capable of modeling common or sophisticated MPH designs as well as complex d...
详细信息
ISBN:
(数字)9781510628380
ISBN:
(纸本)9781510628380
We introduce a generic analytic simulation and image reconstruction software platform for multi-pinhole (MPH) SPECT systems. The platform is capable of modeling common or sophisticated MPH designs as well as complex data acquisition schemes. Graphics processing unit (GPU) acceleration was utilized to make a high-performance computing software. Herein, we describe the software platform and provide verification studies of the simulation and image reconstruction software.
A many-core implementation of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model for computing electromagnetic scattering by a 3-D object on the homegrown many-core SW26010...
详细信息
ISBN:
(数字)9781728153049
ISBN:
(纸本)9781728153056
A many-core implementation of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model for computing electromagnetic scattering by a 3-D object on the homegrown many-core SW26010 CPU of China is presented. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the Structure-of-Array (SoA). The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory (SPM). A double-buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable for solving real-life problems with over four hundred thousand unknowns with a remarkable speed-up. Numerical results show that with the proposed parallel scheme, a total speed-up larger than 7 times can be achieved, compared with the CPU master-core.
Efficient utilization of multi-core computers with shared memory depends on many factors. In this article efficiency of multi-core computers with shared memory is investigated. All computer cores execute either parall...
详细信息
ISBN:
(数字)9781728144115
ISBN:
(纸本)9781728144122
Efficient utilization of multi-core computers with shared memory depends on many factors. In this article efficiency of multi-core computers with shared memory is investigated. All computer cores execute either parallel streams of a single program, developed in accordance with the OpenMP API, or independent programs. There are no interactions between streams and independent programs. Both parallel and non-parallel (single stream) applications share memory where conflicts may occur. Models for determining the acceleration coefficients of a multi-core computer, analytical expressions which reflect the dependence of the acceleration coefficients on the number of cores, the properties performed the programs, cores and shared memory parameters and formulas for determine recommended numbers of cores for parallel applications in accordance with the OpenMP API are suggested. The main reason for the decrease in performance of multi-core computers with shared memory when parallelizing applications in accordance with the OpenMP API is the overload of shared memory.
Volcanoes are very complex geophysical systems where fluids of different nature interact with porous rock at different physical conditions and within a complex matrix of conduits. Two types of seismicity are generated...
详细信息
ISBN:
(纸本)9781538661222
Volcanoes are very complex geophysical systems where fluids of different nature interact with porous rock at different physical conditions and within a complex matrix of conduits. Two types of seismicity are generated by this complex interactions. The first type is characterized by fracture of the elastic media, in which we have the volcano-tectonic events (VT) that produce two distinctive phases: a compressional phase (P wave) and a shear wave (S wave) that travels with different velocities within solid media. The second type is characterized by low frequencies, in which we have a wide variety of long period events (LP) and volcanic tremors. These signals are produced by fluid motion within restricted paths and have normally emergent onsets and no distinctive P or S wave phases. Classical earthquake source location procedures take advantage of the distinctive phases and their different propagation velocity. However, for LP events and tremors, those procedures can not be used. Therefore, complex algorithms have to be applied, demanding much more computer resources and time than the classical location methods. In this work, we present the analysis and design of a LP and tremor location application based on amplitude decay. We demonstrate that the algorithm is highly parallelizable allowing to develop a parallel implementation using the Python programming language and the de-facto standard for parallel computing, the MPI standard. We show experimentally that it exhibits almost linear scalability with respect to the number of events and the number of cores.
Non-orthogonal multiple access (NOMA) is a promising method for the fifth generation (5G) cellular networks as it provides improved spectral efficiency by multiplexing users in power domain. One key challenge for the ...
详细信息
ISBN:
(纸本)9781538659281
Non-orthogonal multiple access (NOMA) is a promising method for the fifth generation (5G) cellular networks as it provides improved spectral efficiency by multiplexing users in power domain. One key challenge for the receivers in NOMA networks is to distinguish the individual signals that use the same band at the same time. Currently, the two widely discussed decoding schemes are successive interference cancellation (SIC) and parallel interference cancellation (PIC). Both schemes suppress the multi-user interference by subtracting the decoded signals from the received signal based on different algorithms, i.e., SIC decodes iteratively and PIC decodes collectively. This paper compares the computation time of SIC and PIC schemes at the base station and demonstrates multi-thread implementation of PIC.
While single machine MapReduce systems can squeeze out maximum performance from available multi-cores, they are often limited by the size of main memory and can thus only process small datasets. Our experience shows t...
详细信息
ISBN:
(纸本)9781450358019
While single machine MapReduce systems can squeeze out maximum performance from available multi-cores, they are often limited by the size of main memory and can thus only process small datasets. Our experience shows that the state-of-the-art single-machine in-memory MapReduce system Metis frequently experiences out-of-memory crashes. Even though today's computers are equipped with efficient secondary storage devices, the frameworks do not utilize these devices mainly because disk access latencies are much higher than those for main memory. Therefore, the single-machine setup of the Hadoop system performs much slower when it is presented with the datasets which are larger than the main memory. Moreover, such frameworks also require tuning a lot of parameters which puts an added burden on the programmer. In this paper we present OMR, an Out-of-core MapReduce system that not only successfully handles datasets that are far larger than the size of main memory, it also guarantees linear scaling with the growing data sizes. OMR actively minimizes the amount of data to be read/written to/from disk via on-the-fly aggregation and it uses block sequential disk read/write operations whenever disk accesses become necessary to avoid running out of memory. We theoretically prove OMR's linear scalability and empirically demonstrate it by processing datasets that are up to 5x larger than main memory. Our experiments show that in comparison to the standalone single-machine setup of the Hadoop system, OMR delivers far higher performance. Also in contrast to Metis, OMR avoids out-of-memory crashes for large datasets as well as delivers higher performance when datasets are small enough to fit in main memory.
Most of industrial induction motors currently used employ simple winding patterns, which commonly are designed to fulfil the fundamental magnetizing flux and torque requirements, disregarding the spatial harmonic cont...
详细信息
ISBN:
(纸本)9781538624777
Most of industrial induction motors currently used employ simple winding patterns, which commonly are designed to fulfil the fundamental magnetizing flux and torque requirements, disregarding the spatial harmonic content of the air-gap magnetomotive force (MMF). However, it is well known that the lower-order MMF spatial harmonics have a negative impact on the motor efficiency, vibration, noise, and torque production. The use of different turns per coil in the winding design is a possible solution to mitigate the problem. In this paper, a novel winding optimizing algorithm is fully described. The air-gap is modelled as a linear function of the current-sheet created by the conductors in the slots. Several winding patterns with different poles for stators with different slots are optimized, and the turns per coil pattern is presented in tables for single and double layer windings with optimal coil pitch shortening. These tables can be used, as reference, in winding design projects. An application example of winding optimization is also presented.
To use modern multiprocessor and distributed computer architectures effectively it is necessary to parallelize the program code in such a way as to achieve a minimum execution time of the program on a given architectu...
详细信息
ISBN:
(纸本)9781538656839
To use modern multiprocessor and distributed computer architectures effectively it is necessary to parallelize the program code in such a way as to achieve a minimum execution time of the program on a given architecture. In this case if the executing devices have different performance characteristics in a distributed system, during parallelization it should be taken into account this difference and the optimization of the execution time of the entire algorithm should be carried out. The method for solving such an optimization problem is proposed in the article.
This paper describes the design, implementation and testing of "Danse-doigts", an edutainment therapeutic application for hemiplegic children. The objective of this program is twofold. Firstly, to allow them...
详细信息
暂无评论