检索结果-内蒙古大学图书馆

TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2024年第12期35卷 2527-2544页

作者： Bhoot, Ruchi Ghanmode, Suved Sanjay Simmhan, Yogesh Indian Inst Sci Bangalore 560012 India

Temporal graphs change with time and have a lifespan associated with each vertex and edge. These graphs are suitable to process time-respecting algorithms where the traversed edges must have monotonic timestamps. Interval-centric Computing Model (ICM) is a distributed programming abstraction to design such temporal algorithms. There has been little work on supporting time-respecting algorithms at large scales for streaming graphs, which are updated continuously at high rates (Millions/s), such as in financial and social networks. In this article, we extend the windowed-variant of ICM for incremental computing over streaming graph updates. We formalize the properties of temporal graph algorithms and prove that our model of incremental computing over streaming updates is equivalent to batch execution of ICM. We design TARIS, a novel distributed graph platform that implements these incremental computing features. We use efficient data structures to reduce memory access and enhance locality during graph updates. We also propose scheduling strategies to interleave updates with computing, and streaming strategies to adapt the execution window for incremental computing to the variable input rates. Our detailed and rigorous evaluation of temporal algorithms on large-scale graphs with up to 2B edges show that TARIS out-performs contemporary baselines, Tink and Gradoop, by 3-4 orders of magnitude, and handles a high input rate of 83k-587 M Mutations/s with latencies in the order of seconds-minutes.

关键词： Social networking (online) Heuristic algorithms Topology Computational modeling Network topology Data structures Clustering algorithms Space exploration Roads COVID-19 Discrete Mathematics distributed programming distributed systems general graph algorithms graph theory information storage and retrieval information technology and systems mathematics of computing numerical analysis programming techniques parallel algorithms systems and software software/software engineering

来源：评论

学校读者我要写书评

暂无评论

Parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2025年第2期53卷 1-22页

作者： Beauvais, Valentin Tonci, Nicolo Robert, Sophie Limet, Sebastien Univ Orleans LIFO 6 Rue Leonard de Vinci F-45067 Orleans France Univ Pisa Comp Sci Dept Largo Bruno Pontecorvo 3 I-56127 Pisa Italy

Over the past decade, the widespread adoption of RNA-seq methodology for transcript-level monitoring has resulted in a surge of biological data requiring comprehensive analysis. The BioSkel project aims to develop a framework for RNA sequencing analysis on multi/many-core machines. This framework relies on generic and modular high-level parallel patterns, enabling biologists to customize their data processing to their specific needs while abstracting away the complexities of parallelization. In this study, we introduce the initial prototype of BioSkel for RNA sequencing analysis, which comprises three main steps: sequence alignment, feature counting, and differential expression analysis. This prototype leverages FastFlow as a back-end for parallelizing the execution, either in shared- and distributed-memory. We provide experimental validations of our approach, considering different architectures and dataset sizes. As a valuable byproduct, we introduce a distributed HPC version of Bowtie2 tool, the first publicly available to our knowledge.

关键词： RNA-seq analysis High-level parallel programming distributed programming Framework for HPC

来源：评论

学校读者我要写书评

暂无评论

An autonomous blockchain-based workflow execution broker for e-science

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2024年第8期27卷 10235-10244页

作者： Alimoglu, Alper Ozturan, Can Bogazici Univ Dept Comp Engn Istanbul Turkiye

Scientific workflows are essential for many applications, enabling the configuration and execution of complex tasks across distributed resources. In this paper, we contribute an Ethereum blockchain-based scientific workflow execution manager, which distributes workflows to run on cluster computing providers that utilize the Slurm workload manager to execute them. We extended our blockchain-based autonomous resource broker called eBlocBroker, which is a DAO-based decentralized coordinator, by providing distributed workflow execution via blockchain. Through various tests, we demonstrate how our eBlockBroker autonomous organization, which is programmed as a smart contract, can manage scientific workflow submission, scheduling, and execution on cluster computing providers. The utilization of blockchain for distributed workflow execution is a new concept. We are motivated because our system has been developed with e-Science in mind where scientific workflows are widely utilized.

关键词： Blockchain Scheduling Cloud distributed programming Partitioning Cluster computing

来源：评论

学校读者我要写书评

暂无评论

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2024年第3期52卷 207-230页

作者： Tonci, Nicolo Rivault, Sebastien Bamha, Mostafa Robert, Sophie Limet, Sebastien Torquati, Massimo Univ Pisa Comp Sci Dept Pisa Italy Univ Orleans Orleans France

Similarity joins are recognized to be among the most used data processing and analysis operations. We introduce a C++-based high-level parallel pattern implemented on top of FastFlow Building Blocks to provide the programmer with ready-to-use similarity join computations. The SimilarityJoin pattern is implemented according to the MapReduce paradigm enriched with locality sensitive hashing (LSH) to optimize the whole computation. The new parallel pattern can be used with any C++ serializable data structure and executed on shared- and distributed-memory machines. We present experimental validations of the proposed solution considering two different clusters and small and large input datasets to evaluate in-core and out-of-core executions. The performance assessment of the SimilarityJoin pattern has been conducted by comparing the execution time against the one obtained from the original hand-tuned Hadoop-based implementation of the LSH-based similarity join algorithms as well as a Spark-based version. The experiments show that the SimilarityJoin pattern: (1) offers a significant performance improvement for small and medium datasets;(2) is competitive also for computations using large input datasets producing out-of-core executions.

关键词： LSH Similarity join High-level parallel programming distributed programming Parallel patterns Big data MapReduce

来源：评论

学校读者我要写书评

暂无评论

distributed-Memory FastFlow Building Blocks

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2023年第1期51卷 1-21页

作者： Tonci, Nicolo Torquati, Massimo Mencagli, Gabriele Danelutto, Marco Univ Pisa Comp Sci Dept Pisa Italy

We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow. The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks (BBs) on distributed systems with minimal changes to the original program. The changes required are all high-level and deal with introducing distributed groups (dgroup), i.e., logical partitions of the BBs composing the application streaming graph. A dgroup, which in turn is implemented using FastFlow's BBs, can be deployed and executed on a remote machine and communicate with other dgroups according to the original shared-memory FastFlow streaming programming model. We present how to define the distributed groups and how we faced the problem of data serialization and communication performance tuning through transparent messages' batching and their scheduling. Finally, we present a study of the overhead introduced by dgroups considering some benchmarks on a sixteen-node cluster.

关键词： High-level parallel programming distributed programming Parallel patterns Algorithmic skeletons Building blocks

来源：评论

学校读者我要写书评

暂无评论

ProbSky: Efficient Computation of Probabilistic Skyline Queries Over distributed Data

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2023年第5期35卷 5173-5186页

作者： Kuo, Ai-Te Chen, Haiquan Tang, Liang Ku, Wei-Shinn Qin, Xiao Auburn Univ Dept Comp Sci & Software Engn Auburn AL 36849 USA Calif State Univ Dept Comp Sci Sacramento CA 90032 USA Facebook Menlo Pk CA 94025 USA

Skyline queries have been widely used in various application domains including multi-criteria decision making, search pruning, and personalized recommendation systems. Given multiple criteria, skyline queries prune the search space of a large collection of multi-dimensional objects to a small set by returning objects that are not dominated by or superior to others. As an extension of the traditional skyline queries, probabilistic skyline queries aim to cope with uncertain datasets. This paper presents a novel MapReduce-based framework, ProbSky, in support of fast parallel distributed evaluation of probabilistic skyline queries on large high-dimensional data. ProbSky is adept at efficiently evaluating exact p-skyline queries on large uncertain data without compromising the quality of query results. From the theoretical point of view, we formally prove two pruning lemmas integrated with ProbSky to strengthen the early pruning capacity. ProbSky builds on top of three optimization techniques: dominant instance pruning, slab-based partitioning, and reference point-based acceleration. These extensive experiments driven by both real and synthetic datasets, reveal that compared to the state-of-the-art methods ProbSky speeds up the evaluation of the exact p-skyline queries on large high dimensional data by at least one order of magnitude in most cases. Our experimental results also validate that by balancing the memory consumption and execution time among machines, ProbSky is adroit at curbing the bottleneck effect that causes severe system performance deterioration.

关键词： Probabilistic logic Query processing Partitioning algorithms distributed databases Scalability Probability density function Search problems Decision making distributed databases parallel programming distributed programming MapReduce skyline query probabilistic skyline query query processing

来源：评论

学校读者我要写书评

暂无评论

Type-Safe Dynamic Placement with First-Class Placed Values

引用

PROCEEDINGS OF THE ACM ON programming LANGUAGES-PACMPL 2023年第OOPSLA期7卷 2142–2170页

作者： Zakhour, George Weisenburger, Pascal Salvaneschi, Guido Univ St Gallen Torstr 25 CH-9000 St Gallen Switzerland

Several distributed programming language solutions have been proposed to reason about the placement of data, computations, and peers interaction. Such solutions include, among the others, multitier programming, choreographic programming and various approaches based on behavioral types. These methods statically ensure safety properties thanks to a complete knowledge about placement of data and computation at compile time. In distributed systems, however, dynamic placement of computation and data is crucial to enable performance optimizations, e.g., driven by data locality or in presence of a number of other constraints such as security and compliance regarding data storage location. Unfortunately, in existing programming languages, dynamic placement conflicts with static reasoning about distributed programs: the flexibility required by dynamic placement hinders statically tracking the location of data and computation. In this paper we present Dyno, a programming language that enables static reasoning about dynamic placement. Dyno features a type system where values are explicitly placed, but in contrast to existing approaches, placed values are also first class, ensuring that they can be passed around and referred to from other locations. Building on top of this mechanism, we provide a novel interpretation of dynamic placement as unions of placement types. We formalize type soundness, placement correctness (as part of type soundness) and architecture conformance. In case studies and benchmarks, our evaluation shows that Dyno enables static reasoning about programs even in presence of dynamic placement, ensuring type safety and placement correctness of programs at negligible performance cost. We reimplement an Android app with similar to 7 K LOC in Dyno, find a bug in the existing implementation, and show that the app's approach is representative of a common way to implement dynamic placement found in over 100 apps in a large open-source app store.

关键词： distributed programming Multitier programming Placement Types Scala Dynamic Placement Union Types

来源：评论

学校读者我要写书评

暂无评论

Safe Shared State in Dataflow Systems 18

Safe Shared State in Dataflow Systems

引用

18th International Conference on distributed and Event-Based Systems (DEBS)

作者： De Martini, Luca Margara, Alessandro Politecn Milan Milan Italy

ISBN: (纸本)9798400704437

The complexity of implementing scalable data processing tasks in parallel and distributed computing environments has pushed the adoption of restricted programming models that simplify expressing such tasks. In this context, the dataflow model has emerged as the standard for building distributed data processing systems. The key insight of this model is to express computations as sequences of operations that do not share any state and can thus be deployed and executed independently on the same or on different machines. however entirely excluding state sharing may be detrimental for performance, as it prevents operations executed on the same machine from accessing common resources. The effects of these limitations become more and more evident as the computational and memory resources of individual hosts increase, and even large-scale data analysis tasks can be performed with few well-equipped machines. Moving from this observation, in this paper we present an extension to the classic dataflow model that enables disciplined state sharing across operators to improve performance. We implement the model in Renoir, a dataflow system written in Rust, and we exploit the characteristics of this programming language to ensure sate access to shared state by design. We present several use cases where our model may be beneficial, and we use them to evaluate the proposed model in terms of performance and code complexity. We show that our model empowers developers to better exploit the resources of machines, resulting in substantial performance improvements in several use cases. Programs written with our extended model may even be more concise and simpler than in the classic dataflow model.

关键词： dataflow shared state distributed programming Rust memory safety data processing

来源：评论

学校读者我要写书评

暂无评论

Lightweight distributed computing framework for orchestrating high performance computing and big data

引用

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES 2022年第4期30卷 1571-1585页

作者： Ince, Muhammed Numan Gunay, Melih Ledet, Joseph Akdeniz Univ Dept Comp Engn Antalya Turkey

In recent years, the need for the ability to work remotely and subsequently the need for the availability of remote computer-based systems has increased substantially. This trend has seen a dramatic increase with the onset of the 2020 pandemic. Often local data is produced, stored, and processed in the cloud to remedy this flood of computation and storage needs. Historically, HPC (high performance computing) and the concept of big data have been utilized for the storage and processing of large data. However, both HPC and Hadoop can be utilized as solutions for analytical work, though the differences between these may not be obvious. Both use parallel processing techniques and offer options for data to be stored in either a centralized or distributed manner. Recent studies have focused on using a hybrid approach with both technologies. Therefore, the convergence between HPC and big data technologies can be filled with distributed computing machines at the layer described. This paper results from the motivation that there exists a necessity for a distributed computing framework that can scale from SOC (system on chip) boards to desktop computers and servers. For this purpose, in this article, we propose a distributed computing environment that can scale up to devices with heterogeneous architecture, where devices can set up clusters with resource-limited nodes and then run on top of. The solution can be thought of as a minimalist hybrid approach between HPC and big data. Within the scope of this study, not only the design of the proposed system is detailed, but also critical modules and subsystems are implemented as proof of concept.

关键词： distributed and parallel computing big data high performance computing distributed programming resource management

来源：评论

学校读者我要写书评

暂无评论

Realizing Persistent Signals in JavaScript 10

Realizing Persistent Signals in JavaScript

引用

10th ACM SIGPLAN International Workshop on Reactive and Event-Based Languages and Systems (REBLS)

作者： Hidaka, Daichi Kamina, Tetsuo Oita Univ Oita Japan

ISBN: (纸本)9798400704000

Reactive programming enables declarative descriptions of dependencies between and computations throughout signals, an abstraction of time-varying values. Signals have been extended to persistent signals (an abstraction of time-varying values with their execution histories) to enable them to go back in time with any given time. Currently, this feature is only supported by SignalJ, an extension to Java with signals. This limits the use of persistent signals only within the Java-based applications. This is an undesirable restriction, because mechanisms used in realizing persistent signals are actually language-independent. To tackle this problem, we propose an implementation of persistent signals in JavaScript, which makes application areas of persistent signals broader, including Web-frontend. To realize persistent signals in JavaScript, seamless connections between JavaScript programs that run on restricted environments such as browsers and time-series databases that serve histories of persistent signals are required. It is also desirable to reuse existing JavaScript ecosystem. To address these problems, we design a relay-server-based architecture and realize persistent signals in JavaScript as a DSL library.

关键词： distributed programming Relay-server-based architecture DSL library

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：