this paper presents the modelling and the numerical simulation results of a bidirectional DC-DC converter using modular Marx power electronic switches to be applicable in high-voltage converters. To achieve ample volt...
详细信息
We present a simulated annealing based partitioning technique for mapping task graphs, onto heterogeneous processingarchitectures. Task partitioning onto homogeneous architectures to minimize the makespan of a task g...
详细信息
We present a simulated annealing based partitioning technique for mapping task graphs, onto heterogeneous processingarchitectures. Task partitioning onto homogeneous architectures to minimize the makespan of a task graph, is a known NP-hard problem. Heterogeneity greatly complicates the aforementioned partitioning problem, thus making heuristic solutions essential. A number of heuristic approaches have been proposed, some using simulated annealing. We propose a simulated annealing method with a novel NEXT STATE function to enable exploration of different regions of the global search space when the annealing temperature is high and making the search more local as the temperature drops. the novelty of our approach is two fold: (1) we go a step further than the existing scientific literature, considering heterogeneity at levels of task parallelism, data parallelism and communication. (2) We present a novel algorithm that uses simulated annealing to find better partitions in the presence of heterogeneous architectures, data parallel execution units, and significant data communication costs. We conduct a statistical analysis of the performance of the proposed method, which shows that our approach clearly outperforms the existing simulated annealing method.
Blind Signal Separation is an algorithmic problem class that deals withthe restoration of original signal data from a signal mixture. Implementations, such as Fast ICA, are optimized for parallelization on CPU or fir...
详细信息
Blind Signal Separation is an algorithmic problem class that deals withthe restoration of original signal data from a signal mixture. Implementations, such as Fast ICA, are optimized for parallelization on CPU or first-generation GPU hardware. Withthe advent of modern, compute centered GPU hardware with powerful features such as dynamic parallelism support, these solutions no longer leverage the available hardware performance in the best-possible way. We present an optimized implementation of the FastICA algorithm, which is specifically tailored for next-generation GPU architectures such as Nvidia Kepler. Our proposal achieves a two digit factor of speedup in the prototype implementation, compared to a multithreaded CPU implementation. Our custom matrix multiplication kernels, tailored specifically for the use case, contribute to the speedup by delivering better performance than the state-of-the-art CUBLAS library.
MapReduce as a popular platform for solving embarrassingly parallel problems has been extensively used on large commodity clusters. However constrained by embarrassingly parallel assumption, some computation patterns ...
详细信息
ISBN:
(纸本)9783662439845;9783662439838
MapReduce as a popular platform for solving embarrassingly parallel problems has been extensively used on large commodity clusters. However constrained by embarrassingly parallel assumption, some computation patterns are not easy to express in MapReduce, and in some cases performance and efficiency can not be achieved without communication between tasks, such as iteration and map phase filtration from a holistic perspective. this paper presents HadoopM, a message-enhanced version of Hadoop MapReduce architecture that it breaks the key embarrassingly parallel assumption and can execute the MR jobs in a more efficient and elegant way. HadoopM allows user-defined message to be passed between mappers or reducers by two message passing mechanisms: lightweight and heavyweight, and asynchronous and synchronous message passing are both supported by system. HadoopM retains the scalability and fault-tolerance of Hadoop and is binary compatible with Hadoop Mapreduce. Our experimental results demonstrate the superiority of modified version over original Hadoop MapReduce on a range of algorithms. In some cases, such as PageRank and Skyline, HadoopM significantly boosts the job performance up to 50%.
By the analysis of problems of mood spread diffusion, combining the theory of Agent, Agent-Based mood diffusion model was established, using CUDA programming tool, which is suitable for parallel computing of the part ...
详细信息
ISBN:
(纸本)9781479932801
By the analysis of problems of mood spread diffusion, combining the theory of Agent, Agent-Based mood diffusion model was established, using CUDA programming tool, which is suitable for parallel computing of the part to carry on the design implementation, thus proving the GPU computing can improve the efficiency of the model calculation.
In the past few years, we have observed a trend of increasing cooperation between computer science and other empirical sciences such as physics, biology, or medical fields. this e-science synergy opens new challenges ...
详细信息
ISBN:
(纸本)9783319119885;9783319119878
In the past few years, we have observed a trend of increasing cooperation between computer science and other empirical sciences such as physics, biology, or medical fields. this e-science synergy opens new challenges for the computer science and triggers important advances in other areas of research. In our particular case, we are facing an astroinformatics challenge of analysing stellar spectra in order to establish automated classification methods for recognizing different types of Be stars. We have chosen similarity search methods, which are effectively utilized in other domains like multimedia content-based retrieval for instance. this paper presents our analysis of the problematics and proposed a solution based on Signature Quadratic Form Distance and feature signatures. We have also conducted intensive empirical evaluation which allowed us to determine appropriate configuration for our similarity model.
Multi-agent systems represent a powerful tool to model several interesting real-world problems. Unfortunately, the limited scalability of many state-of-the-art algorithms hinders their applicability in practical situa...
详细信息
ISBN:
(纸本)9781634391313
Multi-agent systems represent a powerful tool to model several interesting real-world problems. Unfortunately, the limited scalability of many state-of-the-art algorithms hinders their applicability in practical situations: in fact, complex dynamics and interactions among a large number of agents often make the search for an optimal solution an unfeasible task. Against this background, the study and design of new highly parallel computational models could greatly improve solution techniques in the above mentioned fields. In particular, I will introduce two parallel approaches to the coalition formation problem in the context of multi-agent systems, detailing how their performances can benefit from the use of modern parallelarchitectures.
In this paper, we propose a method of enhancing Multi-Objective Genetic algorithms (MOGAs) for document clustering withparallel programming. the document clustering using MOGAs shows better performance than other clu...
详细信息
In this paper, we propose a method of enhancing Multi-Objective Genetic algorithms (MOGAs) for document clustering withparallel programming. the document clustering using MOGAs shows better performance than other clustering algorithms. However, the overall computation time of the MOGAs is considerably long as the number of documents increases. To effectively avoid this problem, we implement the MOGAs with General-Purpose computing on Graphics processing Units (GPGPU) to compute the document similarities for the clustering. Furthermore, we introduce two thread architectures (Term-threads and Document-threads) in the CUDA (Compute Unified Device Architecture) language. the experimental results show that the parallel MOGAs with CUDA are tremendously faster than the general MOGAs.
Feature detection and tracking is an important problem in Computer Vision. Corners in an image are a good indication of features to track. Original algorithms may be expensive even on multicore architectures because t...
详细信息
Feature detection and tracking is an important problem in Computer Vision. Corners in an image are a good indication of features to track. Original algorithms may be expensive even on multicore architectures because they require full convolutions to be performed. Although these can be performed in real time in modern GPUs and multicore CPUs, faster solutions are needed for embedded systems and complex algorithms, given that corner detections is just a step of the analysis process. In this paper we evaluate the performance and energy efficiency of the Harris corner detection algorithm as well as an approximation of it, in both desktop and mobile platforms. the purpose of this paper is three-fold: evaluate the performance gains of GPUs vs. CPUs for several mobile and desktop systems, evaluate whether the Harris approximation provides adequate performance gains to justify its use in mobile and desktop system configurations and, finally, determine which configurations provide real-time performance. According to our evaluation (a) the best GPU solution is 16.3 times faster than the best CPU solution for the desktop case while being 2.6 times more energy efficient and (b) the best GPU solution for the mobile case is 1.2 times faster while being 3.6 times more energy efficient than the respective CPU.
the Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. the irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems....
详细信息
ISBN:
(纸本)9781479976164
the Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. the irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallelalgorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.
暂无评论