The paper describes implementation approaches to large-graph processing on two modern high-performance computational platforms: NVIDIA GPU and Intel KNL. The described approach is based on a deep a priori analysis of ...
详细信息
ISBN:
(纸本)9783319712543;9783319712550
The paper describes implementation approaches to large-graph processing on two modern high-performance computational platforms: NVIDIA GPU and Intel KNL. The described approach is based on a deep a priori analysis of algorithm properties that helps to choose implementation method correctly. To demonstrate the proposed approach, shortest paths and strongly connected components computation problems have been solved for sparse graphs. The results include detailed description of the whole algorithm's development cycle: from algorithm information structure research and selection of efficient implementation methods, suitable for the particular platforms, to specific optimizations for each of the architectures. Based on the joint analysis of algorithm properties and architecture features, a performance tuning, including graph storage format optimizations, efficient usage of the memory hierarchy and vectorization is performed. The developed implementations demonstrate high performance and good scalability of the proposed solutions. In addition, a lot of attention was paid to profiling implemented algorithms with NVIDIA Visual Profiler and Intel (R) VTune (TM) Amplifier utilities. This allows current paper to present a fair comparison, demonstrating advantages and disadvantages of each platform for large-scale graph processing.
There is a growing interest in utilizing graph formulations and graph-based algorithms in different subproblems of genomic analysis. Since graphs provide a natural and efficient representation of sequences of data whe...
详细信息
There is a growing interest in utilizing graph formulations and graph-based algorithms in different subproblems of genomic analysis. Since graphs provide a natural and efficient representation of sequences of data where some structural relationships are observed within the data, we study some graph applications in quantitative analysis of typical RNA-seq and Whole Genome Sequencing pipelines. Analysis of differential alternative splicing from RNA-seq data is complicated by the fact that many RNA-seq reads map to multiple transcripts, besides, the annotated transcripts are often a small subset of the possible transcripts of a gene. This work describes Yanagi, a tool for segmenting transcriptomes to create a library of maximal L-disjoint segments from a complete transcriptome annotation. That segment library preserves transcriptome substrings and structural relationships between transcripts while eliminating unnecessary sequence duplications. First, we formalize the concept of transcriptome segmentation and propose an efficient algorithm for generating segment libraries. The resulting segment sequences can be used with pseudo-alignment tools to quantify gene expression and alternative splicing at the segment level and provide gene-level visualization of the segments for more interpretability. The notion of transcript segmentation as introduced here and implemented in Yanagi opens the door for the application of lightweight, ultra-fast pseudo-alignment algorithms in a wide variety of RNA-seq analyses. Furthermore, we show how transcriptome quantification can be performed from segment-level statistics. We present an EM algorithm that uses segment counts as features to estimate transcripts relative abundances in a way that maximizes the likelihood of the observed sequenced data. Then we tackle the problem of quantification in an incomplete annotation setting. We propose an assembly-free correction procedure that reduces bias in the estimated abundances of the annotate
A major challenge in processing real-world graphs stems from poor locality of memory accesses and vertex reordering algorithms (RAs) have been proposed to improve locality by changing the order of memory accesses. Whi...
详细信息
ISBN:
(纸本)9781665441735
A major challenge in processing real-world graphs stems from poor locality of memory accesses and vertex reordering algorithms (RAs) have been proposed to improve locality by changing the order of memory accesses. While state-of-the-art RAs like SlashBurn, GOrder, and Rabbit-Order effectively speed up graph algorithms, their capabilities and disadvantages are not fully understood, mainly, for three reasons: (1) the large size of datasets, (2) the lack of suitable measurement tools, and (3) disparate characteristics of graphs. The paucity of analysis has also inhibited the design of more efficient RAs. This paper unlocks this black box by introducing a number of tools, including: (1) a cache simulation technique for processing large graphs, (2) the Neighbour to Neighbour Average ID Distance (N2N AID) as a spatial locality metric, (3) the degree distributions of simulated cache miss rate and AID to investigate how locality of different vertices is affected by RAs, and (4) "effective cache size" to measure how much of cache capacity is used to support random accesses. We introduce (1) asymmetricity degree distribution, (2) degree range decomposition, and (3) push and pull locality to present a structural analysis of different types of real-world graphs by explaining their contrasting behaviours in confronting RAs. Finally, we propose a number of improvements for RAs using the analysis provided in this paper.
Media content recommendation is nowadays a common problem. Traditional algorithms based on collaborative filtering require an up-to-date dataset of users and their preferences, which is difficult to gather for huge da...
详细信息
ISBN:
(纸本)9780769550701
Media content recommendation is nowadays a common problem. Traditional algorithms based on collaborative filtering require an up-to-date dataset of users and their preferences, which is difficult to gather for huge database of items. Content-based approach suffers from the complex computation of similarity among items. In this paper we propose an approach to recommendation with a focus on the natural change of user's interests in movies. We make use of a graph representation and experimented with modified graph algorithms. We design a representation of the data about movies in a graph structure and a method which uses our data model for recommendation. We propose four recommendation algorithms which are capable to find recommendations based on initial nodes, which selection is based on the user's current interests. We implemented these algorithms and experimentally evaluated them with real users.
A multimedia programming approach for the generalized graph search (traversal) algorithms is considered. It is based on a concept of cyberFilm, which is a set of multi-media frames defining algorithmic features. Throu...
详细信息
ISBN:
(纸本)0889865450
A multimedia programming approach for the generalized graph search (traversal) algorithms is considered. It is based on a concept of cyberFilm, which is a set of multi-media frames defining algorithmic features. Through these frames, the user can represent computational steps and specify "activity" within the steps. Each set of multimedia frames, represented by a special icon, is supported by a set of template programs to generate corresponding executable codes. Such icons and sets of template programs behind them provide powerful repetitive-type constructs for specifying computational algorithms. The main contributions of this paper are an introduction of a cyberFilm developed for the generalized graph search algorithms and a description of the template programs supporting this cyberFilm. 505-086 in display pdf
Summarization is a brief and accurate representation of input text such that the output covers the most important concepts of the source in a condensed manner. Text Summarization is an emerging technique for understan...
详细信息
With the continuous development of society, the risk assessment of unexpected events in universities has become increasingly important. This study aims to explore the application of graph theory algorithms in complex ...
详细信息
In the ever-evolving information age, automatic text summarization is an important solution to address complex text processing problems, especially in scientific articles. Scientific articles often have complex struct...
详细信息
Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computa...
详细信息
graph alignment refers to the task of finding the vertex correspondence between two correlated graphs of n vertices. Extensive study has been done on polynomial-time algorithms for the graph alignment problem under th...
详细信息
暂无评论