Biomedical systems have been using ontology matching as a primary technique for heterogeneity resolution. However, the natural intricacy and vastness of biomedical data have compelled biomedical ontologies to become l...
详细信息
Biomedical systems have been using ontology matching as a primary technique for heterogeneity resolution. However, the natural intricacy and vastness of biomedical data have compelled biomedical ontologies to become large-scale and complex;consequently, biomedical ontology matching has become a computationally intensive task. Our parallel heterogeneity resolution system, i.e., SPHeRe, is built to cater the performance needs of ontology matching by exploiting the parallelism-enabled multicore nature of today's desktop PC and cloud infrastructure. In this paper, we present the execution and evaluation results of SPHeRe over large-scale biomedical ontologies. We evaluate our system by integrating it with the interoperability engine of a clinical decision support system (CDSS), which generates matching requests for large-scale NCI, FMA, and SNOMED-CT biomedical ontologies. Results demonstrate that our methodology provides an impressive performance speedup of 4.8 and 9.5times over a quad-core desktop PC and a four virtual machine (VM) cloud platform, respectively.
The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.
The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.
Introduction of various cryptographic modes of operation is induced with noted imperfections of symmetric block algorithms. Design of some cryptographic modes of operation has already been exploited as an idea for par...
详细信息
Introduction of various cryptographic modes of operation is induced with noted imperfections of symmetric block algorithms. Design of some cryptographic modes of operation has already been exploited as an idea for parallelization of certain algorithms execution. To the best of our knowledge, there is no evidence in the available literature that output feedback (OFB) mode, which is used in satellite communications, has ever been parallelized. In this paper, we consider the performance of a convenient mode of operation, which performs tweakable parallel encryption using xor encrypt xor (XEX) and xor encrypt (XE) constructions in OFB like mode. We make use of an idea similar to the XTS-AES in order to create two parallel tweakable block ciphers. The first of them is designed using XEX construction, while the second is based on XE construction. Each cipher uses two threads to produce corresponding keystreams. Keystreams are first merged with each other and then used in modified tweakable parallel OFB mode of operation. As a proof of the concept, we have implemented a Java application in which these parallel solutions are applied to collect empirical data. The results obtained show that under certain conditions tweakable parallel OFB modes using XEX and XE constructions can achieve performance accelerations up to 10% and to 20%, respectively. Copyright (c) 2015 John Wiley & Sons, Ltd
Dry eye syndrome is a public health problem, and one of the most common conditions seen by eye care specialists. Among the clinical tests for its diagnosis, the evaluation of the interference patterns observed in the ...
详细信息
Dry eye syndrome is a public health problem, and one of the most common conditions seen by eye care specialists. Among the clinical tests for its diagnosis, the evaluation of the interference patterns observed in the tear film lipid layer is often employed. In this sense, tear film maps illustrate the spatial distribution of the patterns over the whole tear film and provide useful information to practitioners. However, the creation of a single map usually takes tens of minutes. Medical experts currently demand applications with lower response time in order to provide a faster diagnosis for their patients. In this work, we explore different parallel approaches to accelerate the definition of the tear film map by exploiting the power of today's ubiquitous multicore systems. They can be executed on any multicore system without special software or hardware requirements. The experimental evaluation determines the best approach (on-demand with dynamic seed distribution) and proves that it can significantly decrease the runtime. For instance, the average runtime of our experiments with 50 real-world images on a system with AMD Opteron processors is reduced from more than 20 minutes to one minute and 12 seconds.
The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicor...
详细信息
The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicore processors. Hardware transactional memory is one of the critical methods to speedup communications in multicore environment. In this paper, we give a review of the current hardware transactional memory systems for multicore processors. We take a top-down approach to characterizing and classifying various hardware transactional design issues and present a taxonomy of hardware transactional memory systems which is consist of the five fundamental design issues: version management, conflict detection, contention management, virtualization and nesting. Finally, we discussed the active research challenge: the relationship between transactional memory and Input/Output operations and system calls. Crown Copyright (C) 2010 Published by Elsevier BM. All rights reserved.
We propose a model for event-oriented programming under shared memory based on access permissions with explicit parallelism. In order to obtain safe parallelism, programmers need to specify the variable permissions of...
详细信息
ISBN:
(纸本)9783319143132;9783319143125
We propose a model for event-oriented programming under shared memory based on access permissions with explicit parallelism. In order to obtain safe parallelism, programmers need to specify the variable permissions of functions. Blocking operations are non existent, and callback-based APIs are used instead, which can be called in parallel for different events as long as the access permissions are guaranteed. This model scales for both IO and CPU-bounded programs. We have implemented this model in the Eve language, which includes a compiler that generates parallel tasks with synchronization on top of variables, and a work-stealing runtime that uses the epoll interface to manage the event loop. We have also evaluated that model in micro-benchmarks in programs that are either CPU-intensive or IO-intensive with and without shared data. In CPU-intensive programs, it achieved results very close to multithreaded approaches. In the share-nothing IO-intensive benchmark it outperformed all other solutions. In shared-memory IO-intensive benchmark it outperformed other solutions with a more or equal value of writes than read operations.
parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core sha...
详细信息
parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core shared memory node. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advance the computation by one time step on this range. This lazy cache aware layout construction enables to keep the original data structure and leads to very localised code modifications. We show that this approach can improve the execution time by up to 40% when the task size is set to have the data fit in the L2 cache.
The HPCmatlab framework has been developed for Distributed Memory programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrapper...
详细信息
The HPCmatlab framework has been developed for Distributed Memory programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrappers. Point-to-point, collective as well as one-sided communication is supported. Benchmarking results show better performance than the Mathworks Distributed Computing Server. HPCmatlab has been used to successfully parallelize and speed up Matlab applications developed for scientific computing. The application results show good scalability, while preserving the ease of programmability. HPCmatlab also enables shared memory programming using Pthreads and parallel I/O using the ADIOS package.
In order to reduce the complexity of traditional multithreaded parallel programming, this paper explores a new task-based parallel programming using the Microsoft. NET Task parallel Library (TPL). Firstly, this paper ...
详细信息
ISBN:
(纸本)9781479932795
In order to reduce the complexity of traditional multithreaded parallel programming, this paper explores a new task-based parallel programming using the Microsoft. NET Task parallel Library (TPL). Firstly, this paper proposes a custom data partitioning optimization method to achieve an efficient data parallelism, and applies it to the matrix multiplication. The result of the application supports the custom data partitioning optimization method. Then we develop a task parallel application: Image Blender, and this application explains the efficiency and pitfall aspects associated with task parallelism. Finally, the paper analyzes the performance of our applications. Experiments results show that TPL can dramatically alleviate programmer burden and boost the performance of programs with its task-based parallel programming mechanism.
This paper intends to achieve high performance in terms of time by implementing various time consuming application on NVIDIA Graphics Processing Unit (GPU) by using parallel programming model NVIDIA Compute Unified De...
详细信息
This paper intends to achieve high performance in terms of time by implementing various time consuming application on NVIDIA Graphics Processing Unit (GPU) by using parallel programming model NVIDIA Compute Unified Device Architecture (CUDA). NVIDIA CUDA provides platform for developing parallel applications on NVIDIA GPUs. So it gives developers a platform to build high-end parallel processing applications. This paper implements various image processing algorithms on both Central Processing Unit (CPU) and GPU. Implemented point-to-point image processing algorithms are brightening filter, darkening filter, negative filter and RGB to Grayscale filter. Along with various convolution algorithms that consider value of its neighboring pixels are also implemented. Implemented convolution algorithms are sobel filter for edge detection, low pass filter and high pass filter. Performance analysis of the implemented image processing algorithms is done on both CPU and GPU. Analysis is made on images of resolution 3000 X 3000. Color-ed images are used for point-to-point pixel processing algorithms. Grayscale images are used for all convolution algorithms. Performance analysis done for point-to-point processing algorithms by varying number of threads per block. Recursive ray tracing is also implemented on GPU, and found performance gain compare to serial algorithm run on CPU.
暂无评论