The article discusses a method to increase the efficiency of solving the problem of finding images in a database. This method is based on the use of perceptual hashing of an image, three levels of data parallelization...
详细信息
ISBN:
(纸本)9781728103396
The article discusses a method to increase the efficiency of solving the problem of finding images in a database. This method is based on the use of perceptual hashing of an image, three levels of data parallelization and image search procedures. To implement parallel data processing, the principle of symmetric horizontal data distribution and the capabilities of modern processors (SIMD registers and corresponding instructions) are used. The results of a computational experiment, confirming the effectiveness of the proposed method, are presented.
Data parallelism is inherent to multidimensional matrices Algebra. Therefore, the operations of this algebra can be implemented in parallel using generalizations of parallel multiplication algorithms for ordinary matr...
详细信息
ISBN:
(纸本)9781728103396
Data parallelism is inherent to multidimensional matrices Algebra. Therefore, the operations of this algebra can be implemented in parallel using generalizations of parallel multiplication algorithms for ordinary matrices. The article discusses a recursive approach to multiplying multidimensional matrices for the operation of (2, 1)-contracted multiplication of four-dimensional matrices.
A new approach to the use of nanotubes as an alloying tool for living biological objects is proposed. The basis of the mechanism of doping is chemical detonation in the nanotube. A procedure for the accelerated calcul...
详细信息
ISBN:
(纸本)9781728103396
A new approach to the use of nanotubes as an alloying tool for living biological objects is proposed. The basis of the mechanism of doping is chemical detonation in the nanotube. A procedure for the accelerated calculation of nanotube parameters for the purpose of doping is proposed. The possibilities of parallel programming in calculating the parameters of a detonation gas mixture are shown.
A many-core implementation of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model for computing electromagnetic scattering by a 3-D object on the homegrown many-core SW26010...
详细信息
ISBN:
(数字)9781728153049
ISBN:
(纸本)9781728153056
A many-core implementation of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model for computing electromagnetic scattering by a 3-D object on the homegrown many-core SW26010 CPU of China is presented. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the Structure-of-Array (SoA). The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory (SPM). A double-buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable for solving real-life problems with over four hundred thousand unknowns with a remarkable speed-up. Numerical results show that with the proposed parallel scheme, a total speed-up larger than 7 times can be achieved, compared with the CPU master-core.
Efficient utilization of multi-core computers with shared memory depends on many factors. In this article efficiency of multi-core computers with shared memory is investigated. All computer cores execute either parall...
详细信息
ISBN:
(数字)9781728144115
ISBN:
(纸本)9781728144122
Efficient utilization of multi-core computers with shared memory depends on many factors. In this article efficiency of multi-core computers with shared memory is investigated. All computer cores execute either parallel streams of a single program, developed in accordance with the OpenMP API, or independent programs. There are no interactions between streams and independent programs. Both parallel and non-parallel (single stream) applications share memory where conflicts may occur. Models for determining the acceleration coefficients of a multi-core computer, analytical expressions which reflect the dependence of the acceleration coefficients on the number of cores, the properties performed the programs, cores and shared memory parameters and formulas for determine recommended numbers of cores for parallel applications in accordance with the OpenMP API are suggested. The main reason for the decrease in performance of multi-core computers with shared memory when parallelizing applications in accordance with the OpenMP API is the overload of shared memory.
The sophisticated nature of parallel computing concepts makes parallel programming challenging. This has encouraged higher-level frameworks that conceal much of the complications behind abstraction layers. Paradigms i...
详细信息
ISBN:
(纸本)9781538643686
The sophisticated nature of parallel computing concepts makes parallel programming challenging. This has encouraged higher-level frameworks that conceal much of the complications behind abstraction layers. Paradigms in this category are mostly performance centric, and do not share the same sentiments for the robustness of asynchronous executions. This is while current applications demand consistency in addition to fast performance. Therefore, programming environments that offer high-level support for asynchronous exception handling will have higher chances for popularity. This paper discusses our latest enhancements to @PT, a parallel programming environment that is based on Java annotations. The proposed concept promotes the robustness of parallelized programs by adhering to the familiar exception handling standards of sequential code, and reducing the asynchronous execution concerns at the API level. This study suggests that the concept simplifies efficient management of asynchronous exceptions, which appears to be a challenge in parallel programming.
This paper proposes OMP-WHIP, a profiler that measures inherent parallelism in the program for a given input and provides what-if analyses to estimate improvements in parallelism. We propose a novel OpenMP series-para...
详细信息
ISBN:
(纸本)9781538683842
This paper proposes OMP-WHIP, a profiler that measures inherent parallelism in the program for a given input and provides what-if analyses to estimate improvements in parallelism. We propose a novel OpenMP series-parallel graph representation (OSPG) that precisely captures series-parallel relations induced by various directives between different fragments of dynamic execution. OMP-WHIP constructs the OSPG and measures the computation performed by each dynamic fragment using hardware performance counters. This series-parallel representation along with measurement of computation is a performance model of the program for a given input, which enables computation of inherent parallelism. This novel performance model also enables what-if analyses where a programmer can estimate improvements in parallelism when bottlenecks are addressed. We have used OMP-WHIP to identify parallelism bottlenecks in more than forty applications and then designed strategies to improve the speedup in seven applications.
Debugging parallel programs can be a challenging task, especially for the beginners. While the debuggers like DDT and TotalView can be extremely useful in tracking down the program statements that are connected to the...
详细信息
ISBN:
(纸本)9781728101903
Debugging parallel programs can be a challenging task, especially for the beginners. While the debuggers like DDT and TotalView can be extremely useful in tracking down the program statements that are connected to the bugs, often the onus is on the programmers to reason about the logic of the program statements in order to fix the bugs in them. These debuggers may neither be able to precisely indicate the logical errors in the parallel programs nor they may provide information on fixing those errors. Therefore, there is a need for developing tools and educational content on teaching the pitfalls in parallel programming and writing correct code. Such content can be useful to guide the beginners in avoiding commonly observed logical errors and in verifying the correctness of their parallel programs. In this paper, we 1) enumerate some of the logical errors that we have seen in the parallel programs (OpenMP, MPI, and CUDA) that were written by the beginners working with us, and 2) discuss the ways to fix those errors. The errors are mainly related to the data distribution, exiting distributed for-loops, and workload-imbalance. The documentation on these logical errors can contribute in enhancing the productivity of the beginners, and can potentially help them in their debugging efforts. We have added the code samples containing logical errors and their solutions in a Github repository so that the others in the community can reproduce the errors on their systems and learn from them. The content presented in this paper may also be useful for those developing high-level tools for detecting and removing logical errors in parallel programs.
Extracting minimal functional dependencies (MFDs) from relational databases is an import database analysis technique. With the advent of big data era, it is challenging to discover MFDs from big data, especially large...
详细信息
ISBN:
(纸本)9781538680346
Extracting minimal functional dependencies (MFDs) from relational databases is an import database analysis technique. With the advent of big data era, it is challenging to discover MFDs from big data, especially large-scale distributed data stored in many different sites. The key to discovering MFDs as fast as possible is pruning the useless candidate MFDs. And in most existed algorithms, it usually prunes candidate MFDs from top to bottom or from bottom to top. We present a new algorithms FastMFDs for discovering all MFDs from large-scale distributed data both from top to bottom and from bottom to top in parallel. We experimented our algorithm in real-life datasets, and our algorithm is more efficient and faster than the existed discovering algorithms.
Directive-drive programming models, such as OpenMP, are one solution for exploiting the potential of multi-core architectures, and enable developers to accelerate software applications by adding annotations on for-typ...
详细信息
ISBN:
(纸本)9781728111414
Directive-drive programming models, such as OpenMP, are one solution for exploiting the potential of multi-core architectures, and enable developers to accelerate software applications by adding annotations on for-type loops and other code regions. However, manual parallelization of applications is known to be a non trivial and time consuming process, requiring parallel programming skills. Automatic parallelization approaches can reduce the burden on the application development side. This paper presents an OpenMP based automatic parallelization compiler, named AutoPar-Clava, for automatic identification and annotation of loops in C code. By using static analysis, parallelizable regions are detected, and a compilable OpenMP parallel code from the sequential version is produced. In order to reduce the accesses to shared memory by each thread, each variable is categorized into the proper OpenMP scoping. Also, AutoPar-Clava is able to support reduction on arrays, which is available since OpenMP 4.5. The effectiveness of AutoPar-Clava is evaluated by means of the Polyhedral Benchmark suite, and targeting a N-cores x86-based computing platform. The achieved results are very promising and compare favorably with closely related auto-parallelization compilers such as Intel C/C++ Compiler (i.e., icc), ROSE, TRACO, and Cetus.
暂无评论