检索结果-内蒙古大学图书馆

The Table-Hadamard GRNG: An Area-Efficient FPGA Gaussian Random Number Generator

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS 2015年第4期8卷 1–22页

作者： Thomas, David B. Univ London Imperial Coll Sci Technol & Med Dept Elect & Elect Engn London SW7 2AZ England

Gaussian random number generators (GRNGs) are an important component in parallel Monte Carlo simulations using FPGAs, where tens or hundreds of high-quality Gaussian samples must be generated per cycle using very few logic resources. This article describes the Table-Hadamard generator, which is a GRNG designed to generate multiple streams of random numbers in parallel. It uses discrete table distributions to generate pseudo-Gaussian base samples, then a parallel Hadamard transform to efficiently apply the central limit theorem. When generating 64 output samples, the Table-Hadamard requires just 130 slices per generated sample, which is a third of the resources needed by the next best technique, while still providing higher statistical quality.

关键词： Arithmetic operations parallel algorithms reconfigurable applications

来源：评论

学校读者我要写书评

暂无评论

Correlating Centralities of Social Networks

Correlating Centralities of Social Networks

引用

IEEE International Conference on Advanced Networks and Telecommuncations Systems

作者： Neelaabh Gupta Anagh Narain Akshat Arora Dolly Sharma Computer Science Department Shiv Nadar University Gautam Buddha Nagar India

ISBN: (纸本)9781509021949

Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-world social network data sets such as Video Sharing Networks, Social Interaction Network and Co-Authorship Networks to examine their effects on them. We carry out the correlation analysis of these centralities and plot the results to recommend when to use those centrality measures. Additionally, we introduce a new centrality measure - Cohesion Centrality based on the cohesiveness of a graph, develop its sequential algorithm and further devise a parallel algorithm to implement it.

关键词： graph theory social networking (online) Graph theory Social Networks Cohesion sequential algorithm Correlation analysis parallel algorithms Social Capital Video sharing Network Network analysis network data

来源：评论

学校读者我要写书评

暂无评论

Speculative computation in IEC 61499 function blocks execution — Modeling and simulation 14th

Speculative computation in IEC 61499 function blocks executi...

引用

IEEE International Conference on Industrial Informatics (INDIN)

作者： Dmitrii Drozdov Victor Dubinin Valeriy Vyatkin Penza State University Penza Russian Federation Luleå University of Technology Luleå Sweden Aalto University Helsinki Finland

ISBN: (纸本)9781509028719

In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and to reduce response time on input events. Data and control dependencies in FB systems are recognized and defined as a basis for organizing speculative execution of FB algorithms. A simulation model of FB systems with speculative execution based on timed stochastic Petri nets is considered. In addition, the paper discusses the results of simulation experiments conducted in CPN Tools.

关键词： Petri nets parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters 16

PIPES: A Language and Compiler for Task-Based Programming on...

引用

Supercomputing Conference

作者： Martin Kong Louis-Noël Pouchet P. Sadayappan Vivek Sarkar Rice University Ohio State University

ISBN: (纸本)9781467388153

Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and control-flow relations between tasks, offering the runtime maximal freedom to place and schedule tasks. While productivity is increased, high-performance execution remains challenging: the implementation of parallel algorithms typically requires specific task placement and communication strategies to reduce internode communications and exploit data locality. In this work, we present a new macro-dataflow programming environment for distributed-memory clusters, based on the Intel Concurrent Collections (CnC) runtime. Our language extensions let the user define virtual topologies, task mappings, task-centric data placement, task and communication scheduling, etc. We introduce a compiler to automatically generate Intel CnC C++ run-time, with key automatic optimizations including task coarsening and coalescing. We experimentally validate our approach on a variety of scientific computations, demonstrating both productivity and performance.

关键词： Runtime C++ languages Programming Tuners Productivity parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Research on Linkage Disequilibrium Method Based on OpenMP

Research on Linkage Disequilibrium Method Based on OpenMP

引用

International Conference on Information Science and Control Engineering (ICISCE)

作者： Jun Lu Jun Li Ruiqing Jing Qiong Wang Cheng Chang Jiaxing Guo College of Computer Science and Technology Heilongjiang University Harbin China Key Laboratory of Database and Parallel Computing of Heilongjiang Province Harbin China

ISBN: (纸本)9781509025367

Linkage disequilibrium method is applied for the research on inferring population genetics, LD mapping, haploid type diversity analysis and so on. Soybean genotypes are adopted as the data source and linkage disequilibrium parallel algorithm is implemented by OpenMP technology. In this algorithm, single nucleotide polymorphism sites are divided by using sliding windows into groups, adjacent sites allele in a window of each chromosome are parallel calculated and store the LD results. According to the experimental data, the serial and parallel algorithms are compared and analyzed. The conclusion shows that the OpenMP parallel technology can effectively improve the efficiency of linkage disequilibrium analysis method. It is a realistic significance for processing massive biological information data.

关键词： Biological cells Couplings Genetics Instruction sets Sociology Statistics parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

PARCUBE: Sparse parallelizable CANDECOMP-PARAFAC Tensor Decomposition

引用

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 2015年第1期10卷 1–25页

作者： Papalexakis, Evangelos E. Faloutsos, Christos Sidiropoulos, Nicholas D. Carnegie Mellon Univ Sch Comp Sci Pittsburgh PA 15213 USA Univ Minnesota Dept ECE Minneapolis MN 55455 USA

How can we efficiently decompose a tensor into sparse factors, when the data do not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data-mining applications;however, the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose PARCUBE, a new and highly parallelizable method for speeding up tensor decompositions that is well suited to produce sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm's correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (ENRON, LBNL, FACEBOOK and NELL), demonstrating its effectiveness for data-mining practitioners. In particular, we are the first to analyze the very large NELL dataset using a sparse tensor decomposition, demonstrating that PARCUBE enables us to handle effectively and efficiently very large datasets. Finally, we make our highly scalable parallel implementation publicly available, enabling reproducibility of our work.

关键词： algorithms Performance Tensors PARAFAC decomposition sparsity sampling randomized algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Interactive model-based search for global optimization

引用

JOURNAL OF GLOBAL OPTIMIZATION 2015年第3期61卷 479-495页

作者： Wang, Yuting Garcia, Alfredo Univ Virginia Charlottesville VA 22903 USA

Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too little emphasis is put on exploration, the globally optimal solution may not be identified. Increasing the allocation of computational effort to exploration increases the chances of identifying a globally optimal solution but it also slows down convergence. Thus, in a single-thread implementation of model-based search exploration and exploitation are substitutes. In this paper we propose a new algorithmic design for global optimization based upon multiple interacting threads. In this design, each thread implements a model-based search in which the allocation of exploration versus exploitation effort does not vary over time. Threads interact through a simple acceptance-rejection rule preventing duplication of search efforts. We show the proposed design provides a speedup effect which is increasing in the number of threads. Thus, in the proposed algorithmic design, exploration is a complement rather than a substitute to exploitation.

关键词： Model-based search Multi-start local search parallel algorithms Cross-entropy method

来源：评论

学校读者我要写书评

暂无评论

Influence of parallel metrics in the analysis of parallel metaheuristic algorithms

Influence of parallel metrics in the analysis of parallel me...

引用

2011 11th International Conference on Intelligent Systems Design and Applications, ISDA'11

作者： García, Aracelys Luque, Gabriel Alba, Enrique Departamento de IGSW Facultad 1 Universidad de Las Ciencias Informáticas La Habana Cuba Depto. de Lenguajes Y Ciencias de la Computatión Universidad de Málaga Málaga Spain

ISBN: (纸本)9781457716751

High computational requirements of current problems have driven most researches towards efficient processing formulations which require the use of multiple processors interconnected, this is the foundation of the parallel processing mechanism. Among the metrics to measure the performance of parallel algorithms, the most important and used is the speedup, but in the scientific community does not exist a consent on its definition and use. The aim of this work is to study different alternatives evaluating parallel metaheuristics. This report presents the results of several experimental tests to show the use of the speedup evaluating the same parallel distributed Genetic Algorithm in different ways, to solve MAXSAT problem. Our experiments show that depending on how the algorithm speedup is evaluated, different results can be obtained. Taking into account the test results we can conclude that the best scenario for evaluating parallel algorithms is comparing algorithms with the same accuracy, defining the quality of the solutions as stop condition, because all executions reach the optimal value allowing fair comparisons. © 2011 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated SVD Algorithm, Based on QR Factorization and Givens Rotations, for DWI Denoising

A GPU-Accelerated SVD Algorithm, Based on QR Factorization a...

引用

International IEEE Conference on Signal-Image Technologies and Internet-Based System

作者： Livia Marcellino Guglielmo Navarra Dept. of Science and Technology University of Naples Parthenope Naples Italy

In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel version of the QR factorization by means Givens plane rotations using the Sameh and Kuck scheme. The parallel algorithm is driven by an outer loop executed on the CPU. Therefore, threads and blocks configuration is organized in order to use the shared memory and avoid multiple accesses to global memory. However, the main kernel provides coalesced accesses to global memory using contiguous indices. As case study, we consider the application of the SVD in the Overcomplete Local Principal Component Analysis (OLPCA) algorithm for the Diffusion Weighted Imaging (DWI) denoising process. Our results show significant improvements in terms of performances with respect to the CPU version that encourage its usability for this expensive application.

关键词： Matrix decomposition Graphics processing units Central Processing Unit Noise reduction Principal component analysis parallel algorithms Image processing

来源：评论

学校读者我要写书评

暂无评论

An Effective GPU-Based Approach to Probabilistic Query Confidence Computation

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2015年第1期27卷 17-31页

作者： Serra, Edoardo Spezzano, Francesca UMIACS College Pk MD 20742 USA

In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.

关键词： Probabilistic databases query processing probabilistic reasoning monte carlo parallel algorithms GPU-computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：