Distributed computing technology has been widely used to solve complex problems appearing in parallelprocessing systems. Job scheduling is very important in many distributed computing systems, like grid systems and h...
详细信息
Enumeration of chemical compounds greatly assists designing and finding new drugs, and determining chemical structures from mass spectrometry. In our previous study, we developed efficient algorithms, BfsSimEnum and B...
详细信息
Enumeration of chemical compounds greatly assists designing and finding new drugs, and determining chemical structures from mass spectrometry. In our previous study, we developed efficient algorithms, BfsSimEnum and BfsMulEnum for enumerating tree-like chemical compounds without and with multiple bonds, respectively. For many instances, our previously proposed algorithms were able to enumerate chemical structures faster than other existing methods. Latest processors consist of multiple processing cores, and are able to execute many tasks at the same time. In this paper, we develop three parallelized algorithms BfsEnumP1-3 by modifying BfsSimEnum in simple manners to further reduce execution time. BfsSimEnum constructs a family tree in which each vertex denotes a molecular tree. BfsEnumP1-3 divide a set of vertices with some given depth of the family tree into several subsets, each of which is assigned to each processor. For evaluation, we perform experiments for several instances with varying the division depth and the number of processors, and show that BfsEnumP1-3 are useful to reduce the execution time for enumeration of tree-like chemical compounds. In addition, we show that BfsEnumP3 achieves more than 80% parallelization efficiency using up to 11 processors, and reduce the execution time using 12 processors to about 1/10 of that by BfsSimEnum.
Withthe trend of ever growing data centers and scaling core counts, simple programming models for efficient distributed and concurrent programming are required. One of the successful principles for scalable computing...
详细信息
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (Sy...
详细信息
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (System on Chip) are composed of different processing units, with different capabilities, and often with massively parallel computing unit. Due to the complexity of these SoCs, predicting if a given algorithm can be executed in real time on a given architecture is not trivial. In fact it is not a simple task for automotive industry actors to choose the most suited heterogeneous SoC for a given application. Moreover, embedding complex algorithms on these systems remains a difficult task due to heterogeneity, it is not easy to decide how to allocate parts of a given algorithm on the different computing units of a given SoC. In order to help automotive industry in embedding algorithms on heterogeneous architectures, we propose a novel approach to predict performances of image processingalgorithms applicable on different types of computing units. Our methodology is able to predict a more or less wide interval of execution time with a degree of confidence using only high level description of algorithms, and a few characteristics of computing units.
We present a task-based implementation of SpMVM withthe PGAS communication library GPI-2. this computational kernel is essential for the overall performance of the Krylov subspace solvers but its proper hybrid parall...
详细信息
ISBN:
(纸本)9783319265209;9783319265193
We present a task-based implementation of SpMVM withthe PGAS communication library GPI-2. this computational kernel is essential for the overall performance of the Krylov subspace solvers but its proper hybrid parallel design is nowadays still a challenge on hierarchical architectures consisting of multi-and many-core sockets and nodes. the GPI-2 library allows, by default and in a natural way, a task-based parallelization. thus, our implementation is fully asynchronous and it considerably differs from the standard hybrid approaches combining MPI and threads/OpenMP. Here we briefly describe the GPI-2 library, our implementation of the SpMVM routine, and then we compare the performance of our Jacobi preconditioned Richardson solver against the PETSc-Richardson using Poisson BVP in a unit cube as a benchmark test. the comparison employs two types of domain decomposition and demonstrates the preemptive performance and better scalability of our task-based implementation.
Nyström method and low-rank linearized Support Vector Machines (SVMs) are two widely used methods for scaling up kernel SVMs, both of which need to sample part of columns of the kernel matrix to reduce the size. ...
详细信息
In this paper, we parallelize the collision detection of five- axis machining as an example to show how to execute CNC applications on Graphics processing Unit (GPU). We first design and implement an efficient collisi...
详细信息
this paper presents a novel approach for cryptology-specific instructions generation on a reconfigurable architecture which is named ASRA. the ASRA tightly integrates a customized reconfigurable core with a very-long ...
详细信息
ISBN:
(纸本)9781467394741
this paper presents a novel approach for cryptology-specific instructions generation on a reconfigurable architecture which is named ASRA. the ASRA tightly integrates a customized reconfigurable core with a very-long instruction word basic core. Both cores in ASRA can work in parallel. the methodology for cryptology-specific instruction generation can directly deploy algebraic operations as primitives for ASRA's custom function units (CFUs), and is able to eliminate a large portion of design space exploration difficulty from conventional data-flow graph methods. Cryptology-specific instructions for block cipher and hash algorithms which are kernel data processing tasks in security applications are exploited. then an accelerator prototype of the ASRA is built on a Xilinx Kintex-7 FPGA chip. Experiment results show that our work achieves a high performance improvement and a good flexibility.
Computation of optical flow is a fundamental step in computer vision applications. However, due to its high complexity, it is difficult to compute a high-accuracy optical flow field in real time. this paper proposes a...
详细信息
Bloom filters are widely used in databases and network areas. these filters facilitate efficient membership checking with a low false positive ratio. It is a way to improve the throughput of bloom filter by parallel p...
详细信息
暂无评论