Gaussian random number generators (GRNGs) are an important component in parallel Monte Carlo simulations using FPGAs, where tens or hundreds of high-quality Gaussian samples must be generated per cycle using very few ...
详细信息
Gaussian random number generators (GRNGs) are an important component in parallel Monte Carlo simulations using FPGAs, where tens or hundreds of high-quality Gaussian samples must be generated per cycle using very few logic resources. This article describes the Table-Hadamard generator, which is a GRNG designed to generate multiple streams of random numbers in parallel. It uses discrete table distributions to generate pseudo-Gaussian base samples, then a parallel Hadamard transform to efficiently apply the central limit theorem. When generating 64 output samples, the Table-Hadamard requires just 130 slices per generated sample, which is a third of the resources needed by the next best technique, while still providing higher statistical quality.
Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-wo...
详细信息
ISBN:
(纸本)9781509021949
Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-world social network data sets such as Video Sharing Networks, Social Interaction Network and Co-Authorship Networks to examine their effects on them. We carry out the correlation analysis of these centralities and plot the results to recommend when to use those centrality measures. Additionally, we introduce a new centrality measure - Cohesion Centrality based on the cohesiveness of a graph, develop its sequential algorithm and further devise a parallel algorithm to implement it.
In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and ...
详细信息
ISBN:
(纸本)9781509028719
In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and to reduce response time on input events. Data and control dependencies in FB systems are recognized and defined as a basis for organizing speculative execution of FB algorithms. A simulation model of FB systems with speculative execution based on timed stochastic Petri nets is considered. In addition, the paper discusses the results of simulation experiments conducted in CPN Tools.
Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and co...
详细信息
ISBN:
(纸本)9781467388153
Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and control-flow relations between tasks, offering the runtime maximal freedom to place and schedule tasks. While productivity is increased, high-performance execution remains challenging: the implementation of parallel algorithms typically requires specific task placement and communication strategies to reduce internode communications and exploit data locality. In this work, we present a new macro-dataflow programming environment for distributed-memory clusters, based on the Intel Concurrent Collections (CnC) runtime. Our language extensions let the user define virtual topologies, task mappings, task-centric data placement, task and communication scheduling, etc. We introduce a compiler to automatically generate Intel CnC C++ run-time, with key automatic optimizations including task coarsening and coalescing. We experimentally validate our approach on a variety of scientific computations, demonstrating both productivity and performance.
Linkage disequilibrium method is applied for the research on inferring population genetics, LD mapping, haploid type diversity analysis and so on. Soybean genotypes are adopted as the data source and linkage disequili...
详细信息
ISBN:
(纸本)9781509025367
Linkage disequilibrium method is applied for the research on inferring population genetics, LD mapping, haploid type diversity analysis and so on. Soybean genotypes are adopted as the data source and linkage disequilibrium parallel algorithm is implemented by OpenMP technology. In this algorithm, single nucleotide polymorphism sites are divided by using sliding windows into groups, adjacent sites allele in a window of each chromosome are parallel calculated and store the LD results. According to the experimental data, the serial and parallel algorithms are compared and analyzed. The conclusion shows that the OpenMP parallel technology can effectively improve the efficiency of linkage disequilibrium analysis method. It is a realistic significance for processing massive biological information data.
How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications;however, the current st...
详细信息
How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications;however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.
Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too...
详细信息
Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too little emphasis is put on exploration, the globally optimal solution may not be identified. Increasing the allocation of computational effort to exploration increases the chances of identifying a globally optimal solution but it also slows down convergence. Thus, in a single-thread implementation of model-based search exploration and exploitation are substitutes. In this paper we propose a new algorithmic design for global optimization based upon multiple interacting threads. In this design, each thread implements a model-based search in which the allocation of exploration versus exploitation effort does not vary over time. Threads interact through a simple acceptance-rejection rule preventing duplication of search efforts. We show the proposed design provides a speedup effect which is increasing in the number of threads. Thus, in the proposed algorithmic design, exploration is a complement rather than a substitute to exploitation.
High computational requirements of current problems have driven most researches towards efficient processing formulations which require the use of multiple processors interconnected, this is the foundation of the para...
详细信息
In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel vers...
详细信息
In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel version of the QR factorization by means Givens plane rotations using the Sameh and Kuck scheme. The parallel algorithm is driven by an outer loop executed on the CPU. Therefore, threads and blocks configuration is organized in order to use the shared memory and avoid multiple accesses to global memory. However, the main kernel provides coalesced accesses to global memory using contiguous indices. As case study, we consider the application of the SVD in the Overcomplete Local Principal Component Analysis (OLPCA) algorithm for the Diffusion Weighted Imaging (DWI) denoising process. Our results show significant improvements in terms of performances with respect to the CPU version that encourage its usability for this expensive application.
In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data ma...
详细信息
In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.
暂无评论