检索结果-内蒙古大学图书馆

A shared-memory Algorithm for Updating Tree-Based Properties of Large Dynamic Networks

IEEE TRANSACTIONS ON BIG DATA 2022年第2期8卷 302-317页

作者： Srinivasan, Sriram Pollard, Samuel D. Norris, Boyana Das, Sajal K. Bhowmick, Sanjukta Univ Nebraska Omaha Dept Comp Sci Omaha NE 68182 USA Univ Oregon Dept Comp & Informat Sci Eugene OR 97403 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

This paper presents a network-based template for analyzing large-scale dynamic data. Specifically, we propose a novel shared-memory parallel algorithm for updating tree-based structures or properties, such as connected components (CC) and minimum spanning trees (MST), on dynamic networks. The underlying idea is to update the information in a rooted tree data structure that stores the edges of the network that are most relevant to the analysis. Extensive experiments on real-world and synthetic networks demonstrate that, with the exception of the inherently sequential component for creating the rooted tree, our proposed updatiing algorithm is scalable and, in most cases, also requires significantly less memory, energy, and time than recomputing-from-scratch algorithm. To the best of our knowledge, this is the first parallel algorithm for updating MST on weighted dynamic networks. The rooted-tree based framework that we propose in this paper can be extended for updating other weighted and unweighted tree-based properties such as single source shortest path and betweenness and closeness centrality.

关键词： Heuristic algorithms Data structures Parallel algorithms Image edge detection Big Data Partitioning algorithms Maintenance engineering Dynamic networks minimum spanning tree connected components shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

A Conflict-Resilient Lock-Free Linearizable Calendar Queue

引用

ACM TRANSACTIONS ON PARALLEL COMPUTING 2024年第1期11卷 1-32页

作者： Marotta, Romolo Ianni, Mauro Pellegrini, Alessandro Quaglia, Francesco Univ Roma Tor Vergata I-00100 Rome Italy Lockless Srl I-00100 Rome Italy

In the last two decades, great attention has been devoted to the design of non-blocking and linearizable data structures, which enable exploiting the scaled-up degree of parallelism in off-the-shelf shared-memory multi-core machines. In this context, priority queues are highly challenging. Indeed, concurrent attempts to extract the highest-priority item are prone to create detrimental thread conflicts that lead to abort/retry of the operations. In this article, we present the first priority queue that jointly provides: (i) lock-freedom and linearizability;(ii) conflict resiliency against concurrent extractions;(iii) adaptiveness to different contention profiles;and (iv) amortized constant-time access for both insertions and extractions. Beyond presenting our solution, we also provide proof of its correctness based on an assertional approach. Also, we present an experimental study on a 64-CPU machine, showing that our proposal provides performance improvements over state-of-the-art non-blocking priority queues.

关键词： Data structures design and analysis shared memory algorithms concurrent algorithms non-blocking priority queue pending event set

来源：评论

学校读者我要写书评

暂无评论

Real-Time Decompression and Rasterization of Massive Point Clouds

引用

PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES 2024年第3期7卷 1-15页

作者： Goel, Rahul Schuetz, Markus Narayanan, P. J. Kerbl, Bernhard IIIT Hyderabad Hyderabad India TU Wien Vienna Austria

Large-scale capturing of real-world scenes as 3D point clouds (e.g., using LIDAR scanning) generates billions of points that are challenging to visualize. High storage requirements prevent the quick and easy inspection of captured datasets on user-grade hardware. The fastest real-time rendering methods are limited by the available GPU memory and render only around 1 billion points interactively. We show that we can achieve state-of-the-art in both while simultaneously supporting datasets that surpass the capabilities of other methods. We present an on-the-fly point cloud decompression scheme that tightly integrates with software rasterization to reduce on-chip memory requirements by more than 4×. Our method compresses geometry losslessly and provides high visual quality at real-time framerates. We use a GPU-friendly, clipped Huffman encoding for compression. Point clouds are divided into equal-sized batches, which are Huffman-encoded independently. Batches are further subdivided to form easy-to-consume streams of data for massively parallel execution. The compressed point clouds are stored in an access-aware manner to achieve coherent GPU memory access and a high L1 cache hit rate at render time. Our approach can decompress and rasterize up to 120 million Huffman-encoded points per millisecond on-the-fly. We evaluate the quality and performance of our approach on various large datasets against the fastest competing methods. Our approach renders massive 3D point clouds at competitive frame rates and visual quality while consuming significantly less memory, thus unlocking unprecedented performance for the visualization of challenging datasets on commodity GPUs.

关键词： CCS Concepts Computing methodologies -> Rasterization shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Mini-batching with Fused Training and Testing for Data Streams Processing on the Edge 21

Mini-batching with Fused Training and Testing for Data Strea...

引用

21st ACM International Conference on Computing Frontiers (CF)

作者： Luna, Reginaldo Cassales, Guilherme Pfahringer, Bernhard Bifet, Albert Gomes, Heitor Murilo Senger, Hermes Univ Fed Sao Carlos Sao Carlos Brazil Univ Waikato Hamilton New Zealand Victoria Univ Wellington Wellington New Zealand

ISBN: (纸本)9798400705977

Edge Computing (EC) has emerged as a solution to reduce energy demand and greenhouse gas emissions from digital technologies. EC supports low latency, mobility, and location awareness for delay-sensitive applications by bridging the gap between cloud computing services and end-users. Machine learning (ML) methods have been applied in EC for data classification and information processing. Ensemble learners have often proven to yield high predictive performance on data stream classification problems. Mini-batching is a technique proposed for improving cache reuse in multi-core architectures of bagging ensembles for the classification of online data streams, which benefits application speedup and reduces energy consumption. However, the original mini-batching presents limited benefits in terms of cache reuse and it hinders the accuracy of the ensembles (i.e., their capacity to detect behavior changes in data streams). In this paper, we improve mini-batching by fusing continuous training and test loops for the classification of data streams. We evaluated the new strategy by comparing its performance and energy efficiency with the original mini-batching for data stream classification using six ensemble algorithms and four benchmark datasets. We also compare mini-batching strategies with two hardware-based strategies supported by commodity multi-core processors commonly used in EC. Results show that mini-batching strategies can significantly reduce energy consumption in 95% of the experiments. Mini-batching improved energy efficiency by 96% on average and 169% in the best case. Likewise, our new mini-batching strategy improved energy efficiency by 136% on average and 456% in the best case. These strategies also support better control of the balance between performance, energy efficiency, and accuracy.

关键词： Computing methodologies Parallel computing methodologies Parallel algorithms shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Migration in Hardware Transactional memory on Asymmetric Multiprocessor

引用

IEEE ACCESS 2021年 9卷 69346-69364页

作者： Sustran, Zivojin Protic, Jelica Univ Belgrade Sch Elect Engn Belgrade 11120 Serbia

In this paper, a system is presented which implements transactions migration to an asymmetric multiprocessor in order to decrease the probability of conflicts and improve execution performance. Applications parallelization makes programming and testing much more difficult, so the goal is to avoid putting additional burden on a programmer. Therefore, the proposed solution should be fully implemented in hardware. In the asymmetric multiprocessor that is analyzed, all cores have the same instruction set, but they are asymmetric in terms of microarchitectural properties, so that N - 1 "small'' cores are identical, while the N-th "big'' core is different, as it provides better performance and higher capacities of its units. The idea is to perform transaction migration from the "small'' core to the "big'' one, based on the history of transaction execution. The experiments were performed using a significantly upgraded Gem5 simulator and eight parallel applications from the STAMP benchmark suite. The experimental results show the speedup and the rate of successfully executed transactions for five different multiprocessor configurations, including symmetric and asymmetric multiprocessors with or without transaction migration. The improvement our algorithm achieves for suitable applications is up to 14% (10% on average) in turnaround time compared to the solutions which do not make use of asymmetry for scheduling transactions.

关键词： shared memory algorithms multicore architectures hardware transactional memory asymmetric multiprocessor thread migration

来源：评论

学校读者我要写书评

暂无评论

PowerRTF: Power Diagram based Restricted Tangent Face for Surface Remeshing

引用

COMPUTER GRAPHICS FORUM 2023年第5期42卷

作者： Yao, Yuyou Liu, Jingjing Fei, Yue Wu, Wenming Zhang, Gaofeng Yan, Dong-Ming Zheng, Liping Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei Peoples R China Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing Peoples R China Chinese Acad Sci Inst Automat Natl Lab Pattern Recognit NLPR Beijing Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China

Triangular meshes of superior quality are important for geometric processing in practical applications. Existing approximative CVT-based remeshing methodology uses planar polygonal facets to fit the original surface, simplifying the computational complexity. However, they usually do not consider surface curvature. Topological errors and outliers can also occur in the close sheet surface remeshing, resulting in wrong meshes. With this regard, we present a novel method named PowerRTF, an extension of the restricted tangent face (RTF) in conjunction with the power diagram, to better approximate the original surface with curvature adaption. The idea is to introduce a weight property to each sample point and compute the power diagram on the tangent face to produce area-controlled polygonal facets. Based on this, we impose the variable-capacity constraint and centroid constraint to the PowerRTF, providing the trade-off between mesh quality and computational efficiency. Moreover, we apply a normal verification-based inverse side point culling method to address the topological errors and outliers in close sheet surface remeshing. Our method independently computes and optimizes the PowerRTF per sample point, which is efficiently implemented in parallel on the GPU. Experimental results demonstrate the effectiveness, flexibility, and efficiency of our method.

关键词： Computing methodologies ? Computer graphics shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast Parallel algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering 21

Fast Parallel Algorithms for Euclidean Minimum Spanning Tree...

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Wang, Yiqiu Yu, Shangdi Gu, Yan Shun, Julian MIT CSAIL Cambridge MA 02139 USA UC Riverside Riverside CA USA

ISBN: (纸本)9781450383431

This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN*). Our approach is based on generating a well-separated pair decomposition followed by using Kruskal's minimum spanning tree algorithm and bichromatic closest pair computations. We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN*. We also give a new parallel divide-and-conquer algorithm for computing the dendrogram and reachability plots, which are used in visualizing clusters of different scale that arise for both EMST and HDBSCAN*. We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time). We implement our algorithms and propose a memory optimization that requires only a subset of well-separated pairs to be computed and materialized, leading to savings in both space (up to 10x) and time (up to 8x). Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13-55.89x, and existing parallel algorithms by at least an order of magnitude.

关键词： parallel algorithms shared memory algorithms clustering

来源：评论

学校读者我要写书评

暂无评论

GPU Parallel Computation of Morse-Smale Complexes

GPU Parallel Computation of Morse-Smale Complexes

引用

IEEE Visualization Conference (VIS)

作者： Subhash, Varshini Pandey, Karran Natarajan, Vijay Indian Inst Sci Dept Comp Sci & Automat Bangalore Karnataka India

ISBN: (纸本)9781728180144

The Morse-Smale complex is a well studied topological structure that represents the gradient flow behavior of a scalar function. It supports multi-scale topological analysis and visualization of large scientific data. Its computation poses significant algorithmic challenges when considering large scale data and increased feature complexity. Several parallel algorithms have been proposed towards the fast computation of the 3D Morse-Smale complex. The non-trivial structure of the saddle-saddle connections are not amenable to parallel computation. This paper describes a fine grained parallel method for computing the Morse-Smale complex that is implemented on a GPU. The saddle-saddle reachability is first determined via a transformation into a sequence of vector operations followed by the path traversal, which is achieved via a sequence of matrix operations. Computational experiments show that the method achieves up to 7 x speedup over current shared memory implementations.

关键词： Human-centered computing Visualization Visualization techniques Computing methodologies Parallel computing methodologies Parallel algorithms shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

MATRIX FACTORIZATION WITH STOCHASTIC GRADIENT DESCENT FOR RECOMMENDER SYSTEMS

MATRIX FACTORIZATION WITH STOCHASTIC GRADIENT DESCENT FOR RE...

引用

作者： Ömer Faruk Aktulum Bilkent University

学位级别：硕士

Matrix factorization is an efficient technique used for disclosing latent features of real-world data. It finds its application in areas such as text mining, image analysis, social network and more recently and popularly in recommendation systems. Alternating Least Squares (ALS), Stochastic Gradient Descent (SGD) and Coordinate Descent (CD) are among the methods used commonly while factorizing large matrices. SGD-based factorization has proven to be the most successful among these methods after Netflix and KDDCup competitions where the winners' algorithms relied on methods based on SGD. Parallelization of SGD then became a hot topic and studied extensively in the literature in recent years. We focus on parallel SGD algorithms developed for shared memory and distributed memory systems. shared memory parallelizations include works such as HogWild, FPSGD and MLGF-MF, and distributed memory parallelizations include works such as DSGD, GASGD and NOMAD. We design a survey that contains exhaustive analysis of these studies, and then particularly focus on DSGD by implementing it through message-passing paradigm and testing its performance in terms of convergence and speedup. In contrast to the existing works, many real-wold datasets are used in the experiments that we produce using published raw data. We show that DSGD is a robust algorithm for large-scale datasets and achieves near-linear speedup with fast convergence rates.

关键词： Recommender system Matrix Factorization Stochastic Gradient Descent Parallel Computing shared memory algorithms Distributed memory algorithms

来源：评论

学校读者我要写书评

暂无评论

Synchron-ITS: An Interactive Tutoring System to Teach Process Synchronization and shared memory Concepts in an Operating Systems Course

Synchron-ITS: An Interactive Tutoring System to Teach Proces...

引用

17th International Conference on Collaboration Technologies and Systems (CTS)

作者： Putchala, Manoj Kumar Bryant, Adam R. Wright State Univ Dept Comp Sci & Engn Dayton OH 45435 USA

ISBN: (纸本)9781509022991

Operating Systems (OS) is a course in undergraduate computer science curricula to teach students concepts relating to the environment on which their applications run. In practice, OS software is very complicated, and the internal processes and mechanisms are often difficult for students to grasp, particularly those that still struggle with programming. Many OS courses are taught by describing high-level abstractions of structures and algorithms from a textbook, and then providing homework or project assignments that, in the interest of being tractable for the student, may be disconnected from the way an operating system actually performs its tasks. These methods only present a theoretical display of essential concepts which lack concrete examples to anchor the concepts. What many students need is a way to connect the low-level details of an operating system's implementation with the high-level abstractions provided in the class, all while being accessible to people who are still improving newly acquired programming skills. To bridge the gap between OS theory and implementation, we propose an interactive tutoring system to present the concepts involved with process synchronization and shared memory management. In this paper, first, we discuss the research performed to frame the requirements for the tool development. Second, we describe the design architecture, concepts involved and features of the tool. Third, we outline the test plan, user experiments and future improvements planned for this system.

关键词： Operating systems interactive tutoring systems process synchronization shared memory algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：