检索结果-内蒙古大学图书馆

8th IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

作者： Moreland, Kenneth Sandia Natl Labs Livermore CA 94550 USA

ISBN: (纸本)9781538668733

A key component of most large-scale rendering systems is a parallel image compositing algorithm, and the most commonly used compositing algorithms are binary swap and its variants. Although shown to be very efficient, one of the classic limitations of binary swap is that it only works on a number of processes that is a perfect power of 2. Multiple variations of binary swap have been independently introduced to overcome this limitation and handle process counts that have factors that are not 2. To date, few of these approaches have been directly compared against each other, making it unclear which approach is best. This paper presents a fresh implementation of each of these methods using a common software framework to make them directly comparable. These methods to run binary swap with odd factors are directly compared. The results show that some simple compositing approaches work as well or better than more complex algorithms that are more difficult to implement.

关键词： Computing methodologies Computer graphics Rendering Computing methodologies parallel computing methodologies parallel algorithms Massively parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed Community Detection in Large Networks using An Information-Theoretic Approach

Distributed Community Detection in Large Networks using An I...

引用

IEEE International Conference on Big Data

作者： Md Abdul Motaleb Faysal Shaikh Arifuzzaman Department of Computer Science University of New Orleans New Orleans LA USA

ISBN: (数字)9781728108582

ISBN: (纸本)9781728108599

Network (Graph) is a powerful abstraction for representing structures in large complex socio-technological systems. Community detection reveals important patterns and structural organizations in a network. This has numerous applications in the fields of social networking, computer security, bioinformatics, business, marketing, etc. An information-theoretic approach for community detection, named as Infomap method, is a sequential algorithm capable of providing high-quality solutions. However, the emergence of massive networks, often with millions of edges and beyond, makes the problem of discovering communities technically challenging. The sequential algorithm has very high execution time to process such networks. Therefore, a scalable parallel execution model of Infomap algorithm is needed. In this paper, we design a distributed-memory parallel algorithm for community detection based on Infomap method. We balance the workload among the processing units carefully and keep communication among processes minimal. We empirically show that our distributed solution produces excellent quality of communities similar to that of sequential Infomap. In terms of minimum description length (MDL), our distributed results are over 99.5% in alignment with that of the sequential version. Our algorithm also demonstrates good scalability with hundreds of processors while processing large-scale social and information networks.

关键词： Program processors Mathematical model Scalability parallel algorithms Heuristic algorithms Optimization Urban areas

来源：评论

学校读者我要写书评

暂无评论

Lower Bounds on Black-Box Reductions of Hitting to Density Estimation 35

Lower Bounds on Black-Box Reductions of Hitting to Density E...

引用

35th Symposium on Theoretical Aspects of Computer Science (STACS)

作者： Tell, Roei Weizmann Inst Sci Dept Comp Sci & Appl Math Rehovot Israel

ISBN: (纸本)9783959770620

Consider a deterministic algorithm that tries to find a string in an unknown set S subset of {0, 1}(n), under the promise that S has large density. The only information that the algorithm can obtain about S is estimates of the density of S in adaptively chosen subsets of {0, 1}(n), up to an additive error of mu > 0. This problem is appealing as a derandomization problem, when S is the set of satisfying inputs for a circuit C: {0, 1}(n) -> {0, 1} that accepts many inputs: In this context, an algorithm as above constitutes a deterministic black-box reduction of the problem of hitting C (i.e., finding a satisfying input for C) to the problem of approximately counting the number of satisfying inputs for C on subsets of {0, 1}(n). We prove tight lower bounds for this problem, demonstrating that naive approaches to solve the problem cannot be improved upon, in general. First, we show a tight trade-off between the estimation error mu and the required number of queries to solve the problem: When mu = O(log(n)/n) a polynomial number of queries suffices, and when mu >= 4. (log(n)/n) the required number of queries is 2(circle minus(mu.n)). Secondly, we show that the problem "resists" parallelization: Any algorithm that works in iterations, and can obtain p = p(n) density estimates "in parallel" in each iteration, still requires Omega (n/log(p)+log(1/mu)) iterations to solve the problem. This work extends the well-known work of Karp, Upfal, and Wigderson (1988), who studied the setting in which S is only guaranteed to be non-empty (rather than dense), and the algorithm can only probe subsets for the existence of a solution in them. In addition, our lower bound on parallel algorithms affirms a weak version of a conjecture of Motwani, Naor, and Naor (1994);we also make progress on a stronger version of their conjecture.

关键词： Approximate Counting Lower Bounds Derandomization parallel algorithms Query Complexity

来源：评论

学校读者我要写书评

暂无评论

Energy Surface of Pit-Patterned Templates for Growth of Space-Arranged Arrays of Quantum Dots – Molecular Dynamics Calculations Using High-Efficiency algorithms

Energy Surface of Pit-Patterned Templates for Growth of Spac...

引用

International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon)

作者： P.L. Novikov A.V. Dvurechenskii K.V. Pavsky Rzhanov Institute of Semiconductor Physics Novosibirsk Russia Novosibirsk State University Novosibirsk Russia Siberian State University of Telecommunications and Information Sciences Novosibirsk Russia

Using MD simulation the energy surface of pit-patterned Si substrate is obtained. The physical model is based on empirical Tersoff potential, which governs the dynamics of an atomistic crystal system including Si and Ge atoms. The pits at the surface of Si substrate assume the shape of inverted truncated pyramid. The analysis of the energy surface mapped by MD calculations shows that the dense space-arranged array of nanoislands with up to 4 nanoislands per pit may be grown by the deposition of Ge on Si pit-patterned substrate. The calculations were carried using algorithm of parallel programming. The scheme of the parallel algorithm for Verlet neighbor list formation is suggested. The results of calculations using parallel algorithm are presented in form of the speedup depending on the cores number.

关键词： Germanium Substrates Silicon Probes parallel algorithms Heuristic algorithms Atomic measurements

来源：评论

学校读者我要写书评

暂无评论

Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query parallelism 18

Griffin: Uniting CPU and GPU in Information Retrieval System...

引用

23rd ACM SIGPLAN Symposium on Principles and Practice of parallel Programming (PPoPP)

作者： Liu, Yang Wang, Jianguo Swanson, Steven WDC Res Chippenham Wilts England Univ Calif San Diego San Diego CA USA

ISBN: (纸本)9781450349826

Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query processing parallelization and acceleration with co-processors such as GPUs. However, previous work runs queries either on GPU or CPU, ignoring the fact that the best processor for a given query depends on the query's characteristics, which may change as the processing proceeds. We present Griffin, an IR systems that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state-of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation. Our GPU-based approach, as far as we know, achieves the best available GPU search performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm. We evaluate Griffin with real world queries and datasets, and show that it improves query performance by 10x compared to a highly optimized CPU-only implementation, and 1.5x compared to our GPU-approach running alone. We also find that Griffin helps reduce the 95th-, 99th-, and 99.9th-percentile query response time by 10.4x, 16.1x, and 26.8x, respectively.

关键词： Information Retrieval Search Engines Query Processing GPU parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Massively parallel computation of accurate densities for N-body dark matter simulations using the phase-space-element method

引用

ASTRONOMY AND COMPUTING 2017年 20卷 68-76页

作者： Kaehler, R. SIAC Natl Accelerator Lab Kavli Inst Particle Astrophys & Cosmol Menlo Pk CA 94025 USA

This paper presents an accurate density computation approach for large dark matter simulations, based on a recently introduced phase-space tessellation technique and designed for massively parallel, heterogeneous cluster architectures. We discuss a memory efficient construction of an oct-tree structure to sample the mass densities with locally adaptive resolution, according to the features of the underlying tetrahedral tessellation. We propose an efficient GPU implementation for the computationally intensive operation of intersecting the tetrahedra with the cubical cells of the deposit grid, that achieves a speedup of almost an order of magnitude compared to an optimized CPU version. We discuss two dynamic load balancing schemes the first exchanges particle data between cluster nodes and deposits the tetrahedra for each block of the grid structure on single nodes, whereas the second approach uses global reduction operations to obtain the total masses. We demonstrate the scalability of our algorithms with up to 256 GPUs and TB-sized simulation snapshots, resulting in tessellations with more than 400 billion tetrahedra. Published by Elsevier B.V.

关键词： parallel algorithms Dark matter N-body simulations Graphics processors

来源：评论

学校读者我要写书评

暂无评论

Distributed memory building blocks for massive biological sequence analysis

Distributed memory building blocks for massive biological se...

引用

作者： Pan, Tony C. Georgia Institute of Technology

学位级别：博士

K-mer indices and de Bruijn graphs are important data structures in bioinformatics with multiple applications ranging from foundational tasks such as error correction, alignment, and genome assembly, to knowledge discovery tasks including repeat detection and SNP identification. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the data sets at the current generation rate of 1.8 terabases every 3 days. The volume and velocity with which sequencing data is generated necessitate efficient algorithms and implementation of k-mer indices and de Bruijn graphs, two central components in bioinformatic applications. Existing applications that utilize k-mer counting and de Bruijn graphs, however, tend to provide embedded, specialized implementations. The research presented here represents efforts toward the creation of the first reusable, flexible, and extensible distributed memory parallel libraries for k-mer indexing and de Bruijn graphs. These libraries are intended for simplifying the development of bioinformatics applications for distributed memory environments. For each library, our goals are to create a set of API that are simple to use, and provide optimized implementations based on efficient parallel algorithms. We designed algorithms that minimize communication volume and latency, and developed implementations with better cache utilization and SIMD vectorization. We developed Kmerind, a k-mer counting and indexing library based on distributed memory hash table and distributed sorted arrays, that provide efficient insert, find, count, and erase operations. For de Bruijn graphs, we developed Bruno by leveraging Kmerind functionalities to support parallel de Bruijn graph construction, chain compaction, error removal, and graph traversal and element query. Our performance evaluations showed that Kmerind is scalable and high performance. Kmerin

关键词： High performance computing Bioinformatics K-mer index K-mer counting De bruijn graph Next generation sequencing parallel algorithms Distributed memory Distributed algorithms SIMD vectorization Cache friendly algorithms MPI

来源：评论

学校读者我要写书评

暂无评论

Optimization of Tree Modes for parallel Hash Functions: A Case Study

引用

IEEE TRANSACTIONS ON COMPUTERS 2017年第9期66卷 1585-1598页

作者： Atighehchi, Kevin Rolland, Robert Aix Marseille Univ CNRS LIF F-13284 Marseille France Aix Marseille Univ CNRS I2M F-13284 Marseille France

This paper focuses on parallel hash functions based on tree modes of operation for an inner Variable-Input-Length function. This inner function can be either a single-block-length (SBL) and prefix-free MD hash function, or a sponge-based hash function. We discuss the various forms of optimality that can be obtained when designing parallel hash functions based on trees where all leaves have the same depth. The first result is a scheme which optimizes the tree topology in order to decrease the running time. Then, without affecting the optimal running time we show that we can slightly change the corresponding tree topology so as to minimize the number of required processors as well. Consequently, the resulting scheme decreases in the first place the running time and in the second place the number of required processors.

关键词： Hash functions hash tree Merkle tree parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for constructing independent spanning trees in twisted cubes

引用

DISCRETE APPLIED MATHEMATICS 2017年 219卷 74-82页

作者： Chang, Jou-Ming Yang, Ting-Jyun Yang, Jinn-Shyong Natl Taipei Univ Business Inst Informat & Decis Sci Taipei Taiwan Nippon Elect Co Banqiao Dist Taiwan Natl Taipei Univ Business Dept Informat Management Taipei Taiwan

A long-standing conjecture mentions that a k-connected graph G admits k independent spanning trees (ISTs for short) rooted at an arbitrary node of G. An n-dimensional twisted cube, denoted by TO, is a variation of hypercube with connectivity n and has many features superior to those of hypercube. Yang (2010) first proposed an algorithm to construct n edge-disjoint spanning trees in TQ(n) for any odd integer n >= 3 and showed that half of them are ISTs. At a later stage, Wang et al. (2012) inferred that the above conjecture in affirmative for TQ(n) by providing an O (N log N) rime algorithm to construct n ISTs, where N = 2(n) is the number of nodes in TQ(n). However, this algorithm is executed in a recursive fashion and thus is hard to be parallelized. In this paper, we revisit the problem of constructing ISTs in twisted cubes and present a non-recursive algorithm. Our approach can be fully parallelized to make the use of all nodes of TQ(n) as processors for computation in such a way that each node can determine its parent in all spanning trees directly by referring its address and tree indices in O (log N) time. (C) 2016 Elsevier B.V. All rights reserved.

关键词： parallel algorithms Independent spanning trees Interconnection networks Twisted cubes

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithm for Wireless Data Compression and Encryption

引用

JOURNAL OF SENSORS 2017年第2-12期2017卷 1-11页

作者： Qin Jiancheng Lu Yiqin Zhong Yu South China Univ Technol Sch Elect & Informat Engn Guangzhou Guangdong Peoples R China South China Univ Technol Sch Software Guangzhou Guangdong Peoples R China China Telecom Co Ltd Zhaoqing Branch Guangzhou Guangdong Peoples R China

As the wireless network has limited bandwidth and insecure shared media, the data compression and encryption are very useful for the broadcasting transportation of big data in IoT (Internet of Things). However, the traditional techniques of compression and encryption are neither competent nor efficient. In order to solve this problem, this paper presents a combined parallel algorithm named "CZ algorithm" which can compress and encrypt the big data efficiently. CZ algorithm uses a parallel pipeline, mixes the coding of compression and encryption, and supports the data window up to 1 TB (or larger). Moreover, CZ algorithm can encrypt the big data as a chaotic cryptosystem which will not decrease the compression speed. Meanwhile, a shareware named "ComZip" is developed based on CZ algorithm. The experiment results show that ComZip in 64 b system can get better compression ratio than WinRAR and 7-zip, and it can be faster than 7-zip in the big data compression. In addition, ComZip encrypts the big data without extra consumption of computing resources.

关键词： parallel algorithms Data compression (Computer science) Data encryption (Computer science) Wireless communication systems Internet of things

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：