检索结果-内蒙古大学图书馆

2015 15th IEEE ACM International Symposium on Cluster Cloud and Grid Computing (CCGrid 2015)

作者： Gao, Xiaoming Ferrara, Emilio Qiu, Judy Indiana Univ Sch Informat & Comp Bloomington IN 47405 USA

ISBN: (纸本)9781479980062

We introduce Cloud DIKW (Data, Information, Knowledge, Wisdom) as an analysis environment supporting scientific discovery through integrated parallel batch and streaming processing, and apply it to one representative domain application: social media data stream clustering. In this context, recent work demonstrated that high-quality clusters can be generated by representing the data points using high-dimensional vectors that reflect textual content and social network information. However, due to the high cost of similarity computation, sequential implementations of even single-pass algorithms cannot keep up with the speed of real-world streams. This paper presents our efforts in meeting the constraints of real-time social media stream clustering through parallelization in Cloud DIKW. Specifically, we focus on two system-level issues. Firstly, most stream processing engines such as Apache Storm organize distributed workers in the form of a directed acyclic graph (DAG), which makes it difficult to dynamically synchronize the state of parallel clustering workers. We tackle this challenge by creating a separate synchronization channel using a pub-sub messaging system (ActiveMQ in our case). Secondly, due to the sparsity of the high-dimensional vectors, the size of centroids grows quickly as new data points are assigned to the clusters. As a result, traditional synchronization that directly broadcasts cluster centroids becomes too expensive and limits the scalability of the parallel algorithm. We address this problem by communicating only dynamic changes of the clusters rather than the whole centroid vectors. Our algorithm under Cloud DIKW can process the Twitter 10% data stream ("gardenhose") in real-time with 96-way parallelism. By natural improvements to Cloud DIKW, including advanced collective communication techniques developed in our Harp project, we will be able to process the full Twitter data stream in real-time with 1000-way parallelism. Our use of powerful gene

关键词： social media data stream clustering parallel algorithms stream processing engines high-dimensional data synchronization strategies

来源：评论

学校读者我要写书评

暂无评论

A Fast parallel Algorithm for Counting Triangles in Graphs using Dynamic Load Balancing 3

A Fast Parallel Algorithm for Counting Triangles in Graphs u...

引用

IEEE International Conference on Big Data

作者： Arifuzzaman, Shaikh Khan, Maleq Marathe, Madhav Virginia Tech Virginia Bioinformat Inst Network Dynam & Simulat Sci Lab Blacksburg VA 24061 USA Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA

ISBN: (纸本)9781479999255

Finding the number of triangles in a graph (network) is an important problem in graph analysis. The number of triangles also has important applications in graph mining. Big graphs emerging from numerous application areas pose a significant challenge for the analysis and mining since these graphs consist of millions, or even billions, of nodes and edges. Graphs of such scale necessitate the development of efficient parallel algorithms. Existing distributed memory parallel algorithms for counting exact triangles are either Map-Reduce or message passing interface (MPI) based. Map-Reduce based algorithms generate prohibitively large intermediate data and do not demonstrate reasonably good runtime efficiency. The MPI based algorithms offer fast computation of the number of triangles. However, the partitioning and load balancing schemes these algorithms employ are static in nature- the partitions are precomputed based on some estimations. In this paper, we present an efficient MPI-based parallel algorithm for counting triangles in large graph. We consider the case where the main memory of each compute node is large enough to contain the entire graph. We observe that for such a case, computation load can be balanced dynamically and present a dynamic load balancing scheme which improves the performance of the algorithm significantly. Our algorithm demonstrates very good speedups and scales to a large number of processors. The algorithm computes the exact number of triangles in a network with 1 billion edges in 2 minutes with only 100 processors. Our results demonstrate that the algorithm is significantly faster than the related algorithms with static partitioning. In fact, for the real-world networks we experimented on, our algorithm achieves at least 2 times runtime efficiency over the fastest algorithm with static load balancing.

关键词： triangle-counting parallel algorithms large graphs graph mining social networks

来源：评论

学校读者我要写书评

暂无评论

A Top-Down parallel Semisort 15

A Top-Down Parallel Semisort

引用

27th ACM symposium on parallelism in algorithms and Architectures (SPAA)

作者： Gu, Yan Shun, Julian Sun, Yihan Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450335881

Semisorting is the problem of reordering an input array of keys such that equal keys are contiguous but different keys are not necessarily in sorted order. Semisorting is important for collecting equal values and is widely used in practice. For example, it is the core of the MapReduce paradigm, is a key component of the database join operation, and has many other applications. We describe a (randomized) parallel algorithm for the problem that is theoretically efficient (linear work and logarithmic depth), but is designed to be more practically efficient than previous algorithms. We use ideas from the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, we process the hashed values of keys directly top-down. We implement the algorithm and experimentally show on a variety of input distributions that it outperforms a similarly-optimized radix sort on a modern 40-core machine with hyper-threading by about a factor of 1.7-1.9, and achieves a parallel speedup of up to 38x. We discuss the various optimizations used in our implementation and present an extensive experimental analysis of its performance.

关键词： parallel algorithms Semisorting Integer Sorting

来源：评论

学校读者我要写书评

暂无评论

A Simple parallel Algorithm for Biconnected Components in Sparse Graphs 29

A Simple Parallel Algorithm for Biconnected Components in Sp...

引用

29th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Chaitanya, Meher Kothapalli, Kishore Int Inst Informat Technol Hyderabad 500032 Andhra Pradesh India

ISBN: (纸本)9781467376846

In this paper we design and implement an algorithm for finding the biconnected components of a given graph. Our algorithm is based on experimental evidence that finding the bridges of a graph is usually easier and faster in the parallel setting. We use this property to first decompose the graph into independent and maximal 2-edge-connected subgraphs. To identify the articulation points in these 2-edge connected subgraphs, we again convert this into a problem of finding the bridges on an auxiliary graph. It is interesting to note that during the conversion process, the size of the graph may increase. However, we show that this small increase in size and the run time is offset by the consideration that finding bridges is easier in a parallel setting. We implement our algorithm on an Intel i7 980X CPU running 12 threads. We show that our algorithm is on average 2.45x faster than the best known current algorithms implemented on the same platform.

关键词： bridges graph biconnectivity least common ancestor parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel GPU Implementation of the TimberWolf Placement Algorithm 12

A Parallel GPU Implementation of the <i>TimberWolf</i> Place...

引用

12th International Conference on Information Technology: New Generations ITNG

作者： Al-Kawam, Ahmad Harmanani, Haidar M. Lebanese Amer Univ Dept Math & Comp Sci Byblos 14012010 Lebanon

ISBN: (纸本)9781479988280

GPUs have been gaining acceptance in the electronic design automation field as attractive platforms for implementing and accelerating computationally extensive applications. Researchers agree that it is critical that EDA algorithms exploit future platforms and explore the use of parallel algorithms as we move to the manycore era. This paper describes the implementation of the TimberWolf placement algorithm using CUDA and demonstrates the applicability of GPUs in accelerating electronic design automation tools. The algorithm has been implemented on a Xeon Workstation using C, and achieved a substantial acceleration on an Nvidia Tesla C2070 card.

关键词： CUDA VLSI Design Automation VLSI Placement electronic design automation placement algorithm wolf acceleration Computer-Aided Design parallel algorithms Lumber and lumbering GRAPPER PICK UP Graphics Processing Unit Platform ACCEPTANCE

来源：评论

学校读者我要写书评

暂无评论

Smaller and Faster: parallel Processing of Compressed Graphs with Ligra

Smaller and Faster: Parallel Processing of Compressed Graphs...

引用

Data Compression Conference (DCC)

作者： Shun, Julian Dhulipala, Laxman Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781479984305

We study compression techniques for parallel in-memory graph algorithms, and show that we can achieve reduced space usage while obtaining competitive or improved performance compared to running the algorithms on uncompressed graphs. We integrate the compression techniques into Ligra, a recent shared-memory graph processing system. This system, which we call Ligra+, is able to represent graphs using about half of the space for the uncompressed graphs on average. Furthermore, Ligra+ is slightly faster than Ligra on average on a 40-core machine with hyper-threading. Our experimental study shows that Ligra+ is able to process graphs using less memory, while performing as well as or faster than Ligra.

关键词： Graph compression parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Space-efficient parallel Algorithm for Counting Exact Triangles in Massive Networks 17

A Space-efficient Parallel Algorithm for Counting Exact Tria...

引用

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC)

ISBN: (纸本)9781479989379

Finding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks consist of millions, or even billions, of nodes and edges. Such massive networks necessitate the development of efficient parallel algorithms. There exist several MapReduce and an only MPI (Message Passing Interface) based distributed-memory parallel algorithms for counting triangles. MapReduce based algorithms generate prohibitively large intermediate data. The MPI based algorithm can work on quite large networks, however, the overlapping partitions employed by the algorithm limit its capability to deal with very massive networks. In this paper, we present a space-efficient MPI based parallel algorithm for counting exact number of triangles in massive networks. The algorithm divides the network into non-overlapping partitions. Our results demonstrate up to 25-fold space saving over the algorithm with overlapping partitions. This space efficiency allows the algorithm to deal with networks which are 25 times larger. We present a novel approach that reduces communication cost drastically (up to 90%) leading to both a space-and runtime-efficient algorithm. Our adaptation of a parallel partitioning scheme by computing a novel weight function adds further to the efficiency of the algorithm. Denoting average degree of nodes and the number of partitions by (d) over bar and P, respectively, our algorithm achieves up to O(P-2)-factor space efficiency over existing MapReduce based algorithms and up to (d) over bar -factor (approx.) over the algorithm with overlapping partitioning.

关键词： counting triangles parallel algorithms massive networks social networks graph mining space efficiency

来源：评论

学校读者我要写书评

暂无评论

parallel Construction of Succinct Representations of Suffix Tree Topologies 22nd

Parallel Construction of Succinct Representations of Suffix ...

引用

22nd International Symposium on String Processing and Information Retrieval (SPIRE)

作者： Baier, Uwe Beller, Timo Ohlebusch, Enno Univ Ulm Inst Theoret Comp Sci D-89069 Ulm Germany

ISBN: (纸本)9783319238265;9783319238258

A compressed suffix tree usually consists of three components: a compressed suffix array, a compressed LCP-array, and a succinct representation of the suffix tree topology. There are parallel algorithms that construct the suffix array and the LCP-array, but none for the third component. In this paper, we present parallel algorithms on shared memory architectures that construct the enhanced balanced parentheses representation (BPR). The enhanced BPR is an implicit succinct representation of the suffix tree topology, which supports all navigational operations on the suffix tree. It can also be used to efficiently construct the BPS, an explicit succinct representation of the suffix tree topology.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Merging Method to Integrate Different Genome Assemblies

Parallel Merging Method to Integrate Different Genome Assemb...

引用

IEEE International Conference on Bioinformatics and Biomedicine

作者： Romanenkov, Kirill Moscow MV Lomonosov State Univ Fac Computat Math & Cybernet Moscow Russia

ISBN: (纸本)9781467367981

In this paper research in the field of application multiprocessor systems for genome assemblies reconciliation has been carried out. A large number of algorithmic approaches aimed to solve the task of de novo assembly from short reads, however the results of their work on the same raw data often differ essentially. A parallel algorithm for merging two or more assemblies without relying on a reference genome is presented. Due to the large data volume the computations in the distributed memory model on computational cluster are required. The proposed method integrates a combination of draft assemblies reducing resulting contigsfragmentation. Sequential version of the algorithm is implemented in C/C++ and is available at https:***/kromanenkov/gar.

关键词： bioinformatics multiprocessor systems parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

τ-Lop: Modeling performance of shared memory MPI

引用

parallel COMPUTING 2015年 46卷 14-31页

作者： Rico-Gallego, Juan-Antonio Diaz-Martin, Juan-Carlos Univ Extremadura Dept Comp Syst Engn & Telemat Caceres 10003 Spain Univ Extremadura Dept Comp & Commun Technol Caceres 10003 Spain

Formal modeling of the cost of MPI primitives allows a machine independent representation, comparison and performance analysis of their underlying algorithms. Current accepted methods are all the off-springs of LogP, conceived to model the cost of inter-node point-to-point messages in networks of single-processor machines. As new supercomputers are built upon cheap commodity boards with a growing number of cores accessing hierarchical memories, intra-node communication becomes progressively more relevant. Techniques for shared memory communication, such as message segmentation and collectives, not based on point-to-point operations, are substantively different from their inter-node counterparts. This paper unveils the reasons for the poor fit of LogGP and the most recent models in this domain, log(n)P and mlog(n)P, and proposes a new model named tau-Lop, rooted on them, but addressing the challenge of accurately modeling shared memory MPI communications. Broadcast algorithms of mainstream MPI implementations, MPICH and Open MPI, are modeled and analyzed. (C) 2015 Elsevier B.V. All rights reserved.

关键词： Formal models Performance analysis parallel algorithms MPI collectives

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：