There are several projects and missions designed to strictly observe the Sun. These projects usually produce a large amount of information embedded in images. The analysis of such information is valuable for the study...
详细信息
ISBN:
(纸本)9781450313384
There are several projects and missions designed to strictly observe the Sun. These projects usually produce a large amount of information embedded in images. The analysis of such information is valuable for the study and monitoring of solar storms that can affect telecommunications, for instance. The databases sizes with sun image are huge. Several projects are producing images of the Sun and exists a considerable amount of stored images. Combining image processing algorithms with parallel programming techniques we can compute such information faster and a major volume. This paper describes our parallel OpenMP-MPI hybrid solutions for processing Sun images, and our results obtained in a hybrid system, i.e. a cluster with several multi-core nodes. Specifically, we present two methods to detect and categorize solar filaments in hybrid systems: Filament Diffusion-Detection based on graphs and Morph Detection, based on morphological operators. The results show that the Filament Diffusion-Detection based on graphs detects approximately 80% of the filaments, with a 326-fold speed-up over. In turn, Morph Detection detects 58% of the objects with a 54-fold increase in speed. Overall, these results show that our OpenMP-MPI combination works well for hybrid architectures, but more optimizations are needed to improve accuracy. Copyright 2012 ACM.
In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but...
详细信息
In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but expensive for large meshes. To improve the efficiency, the parallelism is adopted by utilizing the modern Graphics Processing Unit (GPU). The presented parallel solution is heavily dependent on the parallel sorting, scan, and reduction. Our parallel solution is efficient and easy to implement, but requires the allocation of large device memory. Our parallel solution can generate the speedups of approximately 55 and 90 over the serial solution when finding the neighboring nodes and elements, respectively. It is easy to implement due to the reason it does not need to perform the mesh-coloring before finding neighbors There are no complex data structures, only integer arrays are needed, which makes our parallel solution very effective. (C) 2020 The Author(s). Published by Elsevier B.V.
My parallel-programming education began in earnest when I joined Sequent Computer Systems in late 1990. This education was both brief and effective: within a few short years, my co-workers and I were breaking new grou...
详细信息
The multicore era has led to a renaissance of shared memory parallel programming models. Moreover, the introduction of task-level parallelization raises the level of abstraction compared to thread-centric expression o...
详细信息
Multi-core processors offer a growing potential of parallelism but pose a challenge of program development for achieving high performance in applications. This pape r presents a comparison of the five parallel program...
详细信息
Efficient parallel programming has always been very tricky and only expert programmers are able to take the most of the computing power of modern computers. Such a situation is an obstacle to the development of the hi...
详细信息
The first Spanish parallel programming Contest was organized in September 2011 within the Jornadas de Paralelismo, in La Laguna, Spain. The aim of the contest is to disseminate parallelism among the participants and C...
详细信息
The memory performance of data mining applications became crucial due to increasing dataset sizes and multi-level cache hierarchies. Recursive partitioning methods such as decision tree and random forest learning are ...
详细信息
The memory performance of data mining applications became crucial due to increasing dataset sizes and multi-level cache hierarchies. Recursive partitioning methods such as decision tree and random forest learning are some of the most important algorithms in this field, and numerous researchers worked on improving the accuracy of model trees as well as enhancing the overall performance of the learning process. Most modern applications that employ decision tree learning favor creating multiple models for higher accuracy by sacrificing performance. In this work, we exploit the flexibility inherent in recursive partitioning based applications regarding performance and accuracy tradeoffs, and propose a framework to improve performance with negligible accuracy losses. This framework employs a data access skipping module (DASM) using which costly cache accesses are skipped according to the aggressiveness of the strategy specified by the user and a heuristic to predict skipped data accesses to keep accuracy losses at minimum. Our experimental evaluation shows that the proposed framework offers significant performance improvements (up to 25%) with relatively much smaller losses in accuracy (up to 8%) over the original case. We demonstrate that our framework is scalable under various accuracy requirements via exploring accuracy changes over time and replacement policies. In addition, we explore NoC/SNUCA systems for similar opportunities of memory performance improvement. (C) 2018 Elsevier Ltd. All rights reserved.
The purpose of a domain-specific language (DSL) is to enable the application programmer to specify a problem, or an abstract algorithm description, in his/her domain of expertise without being burdened by implementati...
详细信息
The purpose of a domain-specific language (DSL) is to enable the application programmer to specify a problem, or an abstract algorithm description, in his/her domain of expertise without being burdened by implementation details. The ideal scenario is that the implementation detail is added in an automatic process of program translation and code generation. The approach of domain-specific program generation has lately received increasing attention in the area of computational science and engineering. In this paper, we introduce the new code generation framework Athariac. Its goal is to support the quick implementation of a language processing and program optimization platform for a given DSL based on stepwise term rewriting. We demonstrate the framework's use on our DSL ExaSlang for the specification and optimization of multigrid solvers. On this example, we provide evidence of Athariac's potential for making domain-specific software engineering more productive.
Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectur...
详细信息
暂无评论