In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but...
详细信息
In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but expensive for large meshes. To improve the efficiency, the parallelism is adopted by utilizing the modern Graphics Processing Unit (GPU). The presented parallel solution is heavily dependent on the parallel sorting, scan, and reduction. Our parallel solution is efficient and easy to implement, but requires the allocation of large device memory. Our parallel solution can generate the speedups of approximately 55 and 90 over the serial solution when finding the neighboring nodes and elements, respectively. It is easy to implement due to the reason it does not need to perform the mesh-coloring before finding neighbors There are no complex data structures, only integer arrays are needed, which makes our parallel solution very effective. (C) 2020 The Author(s). Published by Elsevier B.V.
My parallel-programming education began in earnest when I joined Sequent Computer Systems in late 1990. This education was both brief and effective: within a few short years, my co-workers and I were breaking new grou...
详细信息
Multi-core processors offer a growing potential of parallelism but pose a challenge of program development for achieving high performance in applications. This pape r presents a comparison of the five parallel program...
详细信息
The multicore era has led to a renaissance of shared memory parallel programming models. Moreover, the introduction of task-level parallelization raises the level of abstraction compared to thread-centric expression o...
详细信息
Efficient parallel programming has always been very tricky and only expert programmers are able to take the most of the computing power of modern computers. Such a situation is an obstacle to the development of the hi...
详细信息
The first Spanish parallel programming Contest was organized in September 2011 within the Jornadas de Paralelismo, in La Laguna, Spain. The aim of the contest is to disseminate parallelism among the participants and C...
详细信息
The memory performance of data mining applications became crucial due to increasing dataset sizes and multi-level cache hierarchies. Recursive partitioning methods such as decision tree and random forest learning are ...
详细信息
The memory performance of data mining applications became crucial due to increasing dataset sizes and multi-level cache hierarchies. Recursive partitioning methods such as decision tree and random forest learning are some of the most important algorithms in this field, and numerous researchers worked on improving the accuracy of model trees as well as enhancing the overall performance of the learning process. Most modern applications that employ decision tree learning favor creating multiple models for higher accuracy by sacrificing performance. In this work, we exploit the flexibility inherent in recursive partitioning based applications regarding performance and accuracy tradeoffs, and propose a framework to improve performance with negligible accuracy losses. This framework employs a data access skipping module (DASM) using which costly cache accesses are skipped according to the aggressiveness of the strategy specified by the user and a heuristic to predict skipped data accesses to keep accuracy losses at minimum. Our experimental evaluation shows that the proposed framework offers significant performance improvements (up to 25%) with relatively much smaller losses in accuracy (up to 8%) over the original case. We demonstrate that our framework is scalable under various accuracy requirements via exploring accuracy changes over time and replacement policies. In addition, we explore NoC/SNUCA systems for similar opportunities of memory performance improvement. (C) 2018 Elsevier Ltd. All rights reserved.
The purpose of a domain-specific language (DSL) is to enable the application programmer to specify a problem, or an abstract algorithm description, in his/her domain of expertise without being burdened by implementati...
详细信息
The purpose of a domain-specific language (DSL) is to enable the application programmer to specify a problem, or an abstract algorithm description, in his/her domain of expertise without being burdened by implementation details. The ideal scenario is that the implementation detail is added in an automatic process of program translation and code generation. The approach of domain-specific program generation has lately received increasing attention in the area of computational science and engineering. In this paper, we introduce the new code generation framework Athariac. Its goal is to support the quick implementation of a language processing and program optimization platform for a given DSL based on stepwise term rewriting. We demonstrate the framework's use on our DSL ExaSlang for the specification and optimization of multigrid solvers. On this example, we provide evidence of Athariac's potential for making domain-specific software engineering more productive.
Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectur...
详细信息
This article focuses on heat radiation intensity optimization on the surface of a shell metal mould. Such moulds are used in the automotive industry in the artificial leather production (the artificial leather is used...
详细信息
This article focuses on heat radiation intensity optimization on the surface of a shell metal mould. Such moulds are used in the automotive industry in the artificial leather production (the artificial leather is used, e.g., on car dashboards). The mould is heated by infrared heaters. After the required temperature is attained, the inner mould surface is sprinkled with special PVC powder. The powder melts and after cooling down it forms the artificial leather. A homogeneous temperature field of the mould is a necessary prerequisite for obtaining a uniform colour shade and material structure of the artificial leather. The article includes a description of a mathematical model that allows to calculate the heat radiation intensity on the outer mould surface for each fixed positioning of the infrared heaters. Next, we use this mathematical model to optimize the locations of the heaters to provide approximately the same heat radiation intensity on the whole outer mould surface during the heating process. The heat radiation intensity optimization is a complex task, because the cost function may have many local minima. Therefore, using gradient methods to solve this problem is not suitable. A differential evolution algorithm is applied during the optimization process. Asymptotic convergence of the algorithm is shown. The article contains a practical example including graphical outputs. The calculations were performed by means of Matlab code written by the authors.
暂无评论