Raman spectrometry is a technique that allows detecting chemical products through a number of representative peaks found in an image spectrum or numeric series of data. Raman spectrometry is a necessary technic in man...
详细信息
ISBN:
(纸本)9781538606865
Raman spectrometry is a technique that allows detecting chemical products through a number of representative peaks found in an image spectrum or numeric series of data. Raman spectrometry is a necessary technic in many fields such as physics, chemistry and Biology. the Raman spectrometry machine analyses a product and generates images as a curve. the interpretation of the curves peaks permits to detect the chemical origin of the analyzed product. Scientists do this operation manually, which makes it hard and long in terms of time. the aim of the present paper is to automate the molecule detection operation using image-processing techniques. We propose a parallel solution to detect the peaks using OpenCL on Graphics processing Unit (GPU). GPUs allow us to make the application faster and more efficient, thanks to its multicore architecture.
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have ...
详细信息
ISBN:
(纸本)9781538634370
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have mainly focused on extracting other parallel patterns, such as data or pipeline level parallelism. In this paper, we propose a toolflow for the extraction of recursion level parallelism for embedded multicore systems. To achieve this, the toolflow verifies not only the mutual independence of recursive call-sites, but also selects an appropriate task granularity to ensure a good trade-off between load balancing and parallelization overhead. Profitable parallelization opportunities are implemented by using compiler directives from the OpenMP tasking model. Results show the effectiveness of our toolflow, as it is able to speedup sequential recursive programs between 2.5x and 3.8x on a quad-core platform.
the problem of scheduling independent jobs on identical parallel machines for minimizing makespan has been intensely studied in the literature. One of the most popular constructive algorithms for this problem is the L...
详细信息
ISBN:
(纸本)9781510840232
the problem of scheduling independent jobs on identical parallel machines for minimizing makespan has been intensely studied in the literature. One of the most popular constructive algorithms for this problem is the LPT(Longest processing Time First) rule whose approximation ratio has been proved by contradiction. A direct proof of its approximation ratio is presented, which can be regarded as an acquisition of knowledge by deductive means.
Measuring cognitive load is crucial for many applications such as information personalization, adaptive intelligent tutoring systems, etc. Cognitive load estimation using Electroencephalogram (EEG) signals is widespre...
详细信息
ISBN:
(纸本)9783030040215;9783030040208
Measuring cognitive load is crucial for many applications such as information personalization, adaptive intelligent tutoring systems, etc. Cognitive load estimation using Electroencephalogram (EEG) signals is widespread as it produces clear indications of cognitive activities by measuring changes of neural activation in the brain. However, the existing cognitive load estimation techniques are based on machine learning algorithms, which follow signal denoising and hand-crafted feature extraction to classify different loads. there is a need to find a better alternative to the machine learning approach. Of late, deep learning approach has been successfully applied to many applications namely, computer vision, pattern recognition, speech processing, etc. However, deep learning has not been extensively studied for the classification of cognitive load data captured by an EEG. In this work, two deep learning models are studied, namely stacked denoising autoencoder (SDAE) followed by a multilayer perceptron (MLP) and long short term memory (LSTM) followed by an MLP to classify cognitive load data. SDAE and LSTM are used for feature extraction and MLP for classification. It is observed that deep learning models perform significantly better than the conventional machine learning classifiers such as support vector machine (SVM), k-nearest neighbors (KNN), and linear discriminant analysis (LDA).
GPUs are commonly used as coprocessors to accelerate a compute-intensive task, thanks to their massively parallel architecture. there is study into different abstract parallel models, which allow researchers to design...
详细信息
ISBN:
(纸本)9781538610442
GPUs are commonly used as coprocessors to accelerate a compute-intensive task, thanks to their massively parallel architecture. there is study into different abstract parallel models, which allow researchers to design and analyse parallelalgorithms. However, most work on analysing GPU algorithms has been software based tools for profiling a GPU algorithm. Recently, some abstract GPU models have been proposed, yet they do not capture all elements of a GPU. In particular, they miss the data transfer between CPU and GPU, which in practice can cause a bottleneck and reduce performance dramatically. We propose a comprehensive model called Abstract Transferring GPU which to our knowledge is the first abstract GPU model to capture data transfer between CPU and GPU. We show via experiments, that existing abstract GPU models cannot sufficiently capture all of the actual running of a GPU algorithm time in all cases, as they do not capture data transfer. We show that by capturing data transfer with our model, we are able to obtain more accurate predictions of the GPU algorithm actual running time. It is expected that our model helps improve design and analysis of heterogeneous systems consisting of CPU and GPU, and will allow researchers to make better informed implementation decisions, as they will be aware how data transfer will affect their programs.
the Intel Xeon Phi Knights Landing manycore processor comes with new interesting features: on-chip high-bandwidth memory and several user-selectable NUMA configurations. In this paper, we look into how these affect ap...
详细信息
ISBN:
(数字)9783319654829
ISBN:
(纸本)9783319654829;9783319654812
the Intel Xeon Phi Knights Landing manycore processor comes with new interesting features: on-chip high-bandwidth memory and several user-selectable NUMA configurations. In this paper, we look into how these affect applications that target the Open Community Runtime (OCR), an asynchronous tasked-based runtime system for future parallelarchitectures. We have extended our OCR runtime to make it NUMA aware and to allow it to use the high-bandwidth memory. We have conducted a range of experiments, comparing OpenMP, TBB, our OCR implementation, and the reference OCR implementation on different machine configurations using a memory intensive seismic simulation.
Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. this paper presents two FPGA-based architecturesthat accelerate stream join proce...
详细信息
ISBN:
(纸本)9789090304281
Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. this paper presents two FPGA-based architecturesthat accelerate stream join processing. the proposed hardware-based systems were implemented on a multi-FPGA hybrid system with high memory bandwidth. the experimental evaluation shows that our proposed systems can outperform a software-based solution that runs on a high-end, 48-core multiprocessor platform by at least one order of magnitude. In addition, the proposed solutions outperform any other previously proposed hardware-based or software-based solutions for stream join processing. Finally, our proposed hardware-based architectures can be used as generic templates to map stream processingalgorithms on reconfigurable logic, taking into consideration real-world challenges and restrictions.
Artificial neural networks are inspired by biological neural networks formed by many real neurons with spiking activities. It is important to simulate the spiking activities under different conditions. It is well know...
详细信息
Artificial neural networks are inspired by biological neural networks formed by many real neurons with spiking activities. It is important to simulate the spiking activities under different conditions. It is well known that the Hodgkin-Huxley (HH) equations can be used for simulation. However, we usually don't know the conductance of ion channels in the equation, which is required for simulation. In this paper, we develop a parallel genetic algorithm to estimate the conductance with a visual software tool. By fitting the experimental data, it is shown that when the number of individuals in the genetic algorithm is above 2000, the 5th generation can yield a near optimal solution and achieve a good fitting result.
Rate-constrained motion estimation (RCME) is the most computationally intensive task of H.265/HEVC encoding. Massively parallelarchitectures, such as graphics processing units (GPUs), used in combination with a multi...
详细信息
ISBN:
(纸本)9781509021758
Rate-constrained motion estimation (RCME) is the most computationally intensive task of H.265/HEVC encoding. Massively parallelarchitectures, such as graphics processing units (GPUs), used in combination with a multi-core central processing unit (CPU), provide a promising computing platform to achieve fast encoding. However, the dependencies in deriving motion vector predictors (MVPs) prevent the parallelization of prediction units (PUs) processing at a frame level. Moreover, the conditional execution structure of typical fast search algorithms is not suitable for GPUs designed for data-intensive parallel problems. In this paper, we propose a novel highly parallel RCME method based on multiple temporal motion vector (MV) predictors and a new fast nested diamond search (NDS) algorithm well-suited for a GPU. the proposed framework provides fine-grained encoding parallelism. Experimental results show that our approach provides reduced GPU load with better BD-Rate compared to prior full search parallel methods based on a single MV predictor.
Analysis of urban traffic data has obtained a great attention in recent years. In the study of urban traffic data processing, the batch computing based on historical data and the stream computing based on real-time da...
详细信息
ISBN:
(数字)9783319654829
ISBN:
(纸本)9783319654829;9783319654812
Analysis of urban traffic data has obtained a great attention in recent years. In the study of urban traffic data processing, the batch computing based on historical data and the stream computing based on real-time data are isolated, and the two computing frameworks are not synergized. therefore, a method of urban traffic data processing based on batch and stream collaborative computing is proposed. Batch computing has the advantage of high throughput, so it is more suitable for calculating the historical data of urban traffic and the results of stream computing deeply. Stream computing withthe advantage of low delay can be used to calculate the traffic data in real time, combined withthe results of batch computing, then the conclusion of urban traffic data processing are more comprehensive and accurate.
暂无评论