检索结果-内蒙古大学图书馆

Micro-batch and data frequency for stream processing on multi-cores

JOURNAL OF SUPERCOMPUTING 2023年第8期79卷 9206-9244页

作者： Garcia, Adriano Marques Griebler, Dalvan Schepke, Claudio Fernandes, Luiz Gustavo Pontif Catholic Univ Rio Grande do Sul PUCRS Sch Technol Ave Ipiranga 6681 Partenon BR-90619900 Porto Alegre RS Brazil Fed Univ Pampa UNIPAMPA Lab Adv Studies Computat LEA Ave Tiaraju 810 BR-97546550 Alegrete RS Brazil

Latency or throughput is often critical performance metrics in stream processing. Applications' performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.

关键词： parallel programming TBB FastFlow Benchmark Performance analysis Stream processing applications

来源：评论

学校读者我要写书评

暂无评论

Simplified High Level parallelism Expression on Heterogeneous Systems through Data Partition Pattern Description

引用

COMPUTER JOURNAL 2023年第6期66卷 1400-1418页

作者： Wu, Shusen Dong, Xiaoshe Chen, Heng Wang, Longxiang Wang, Qiang Zhu, Zhengdong Xi An Jiao Tong Univ Sch Comp Sci & Technol Xian 710049 Shaanxi Peoples R China

With the development of heterogeneous systems, the demand for high-level programming methods that ease heterogeneous programming and produce portable applications has become more urgent. This paper proposes DACL, the data associated computing language. DACL introduces data partition patterns to achieve architecture-independent parallelism expression. Meanwhile, DACL provides simplified language extensions, as well as programming features such as serialization of the computing process, parameterization of data attributes and modularity, thus reducing the difficulty of heterogeneous programming and improving programming productivity. The operational semantics show that DACL enables different levels of parallelism degree calculation and retains data access patterns, reserving optimization potential. To support cross-platform execution, the currently implemented source-to-source compilers employ OpenMP and OpenCL as the backend. We reconstructed multiple benchmarks selected from the Parboil and Rodinia benchmark suits with DACL and conducted a comparison test on CPU, GPU and MIC platforms. The code size of each rebuilt benchmark is roughly equivalent to that of the serial code, which is only 13%-64% of the benchmark OpenCL code. With the support of the compilation system, the reconstructed code can execute on different processors without modification, yielding a competitive or better performance to that of the manually written benchmark code.

关键词： heterogenous systems parallel programming data partition patterns data associated computing simplicity

来源：评论

学校读者我要写书评

暂无评论

parallel protein multiple sequence alignment approaches: a systematic literature review

引用

JOURNAL OF SUPERCOMPUTING 2023年第2期79卷 1201-1234页

作者： Almanza-Ruiz, Sergio H. Chavoya, Arturo Duran-Limon, Hector A. CUCEA Univ Guadalajara Dept Informat Syst Periferico Norte 799 Zapopan 45100 Jalisco Mexico

Multiple sequence alignment approaches refer to algorithmic solutions for the alignment of biological sequences. Since multiple sequence alignment has exponential time complexity when a dynamic programming approach is applied, a substantial number of parallel computing approaches have been implemented in the last two decades to improve their performance. In this paper, we present a systematic literature review of parallel computing approaches applied to multiple sequence alignment algorithms for proteins, published in the open literature from 1988 to 2022;we extracted articles from four scientific databases: ACM Digital Library, IEEE Xplore, Science Direct and SpringerLink, and four journals: Bioinformatics, PLOS Computational Biology, PLOS ONE, and Scientific Reports. Additionally, in order to cover other potential databases and journals, we performed a transversal search through Google Scholar. We conducted a selection process that yielded 106 research articles;then, we analyzed these articles and defined a classification framework. Additionally, we point out some directions and trends for parallel computing approaches for multiple sequence alignment, as well as some unsolved problems.

关键词： Systematic review Multiple sequence alignment parallel programming Protein

来源：评论

学校读者我要写书评

暂无评论

Online and transparent self-adaptation of stream parallel patterns

引用

COMPUTING 2023年第5期105卷 1039-1057页

作者： Vogel, Adriano Mencagli, Gabriele Griebler, Dalvan Danelutto, Marco Fernandes, Luiz Gustavo Pontifical Catholic Univ Rio Grande do Sul PUCRS Sch Technol Porto Alegre RS Brazil Univ Pisa UNIPD Dept Comp Sci Pisa Italy Tres de Maio Fac SETREM Lab Adv Res Cloud Comp LARCC Tres De Maio Brazil

Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions.

关键词： Autonomic systems parallel programming parallelism abstractions Self-adaptive software Stream processing

来源：评论

学校读者我要写书评

暂无评论

A massively parallel implementation of multilevel Monte Carlo for finite element models

引用

MATHEMATICS AND COMPUTERS IN SIMULATION 2023年第1期213卷 18-39页

作者： Badia, Santiago Hampton, Jerrad Principe, Javier Monash Univ Sch Math Clayton Vic 3800 Australia Ctr Int Metodes Numer Engn Esteve Terrades 5 E-08860 Castelldefels Spain Univ Politecn Cataluna Campus Diagonal BesosAve Eduard Maristany 16Edif Barcelona 08019 Spain

The Multilevel Monte Carlo (MLMC) method has proven to be an effective variance-reduction statistical method for Uncertainty Quantification (UQ) in Partial Differential Equation (PDE) models, combining model computations at different levels to create an accurate estimate. Still, the computational complexity of the resulting method is extremely high, particularly for 3D models, which requires advanced algorithms for the efficient exploitation of High Performance Computing (HPC). In this article we present a new implementation of the MLMC in massively parallel computer architectures, exploiting parallelism within and between each level of the hierarchy. The numerical approximation of the PDE is performed using the finite element method but the algorithm is quite general and could be applied to other discretization methods. The two key ingredients of the implementation are a good processor partition scheme together with a good scheduling algorithm to assign work to different processors. We introduce a multiple partition of the set of processors that permits the simultaneous execution of different levels and we develop a dynamic scheduling algorithm to exploit it. The problem of finding the optimal scheduling of distributed tasks in a parallel computer is an NP-complete problem. We propose and analyze a new greedy scheduling algorithm to assign samples and we show that it is a 2-approximation, which is the best that may be expected under general assumptions. On top of this result we design a distributed memory implementation using the Message Passing Interface (MPI) standard. Finally we present a set of numerical experiments illustrating its scalability properties.& COPY;2023 The Author(s). Published by Elsevier B.V. on behalf of International Association for Mathematics and Computers in Simulation (IMACS). This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).

关键词： Multilevel Monte Carlo Uncertainty quantification Geometric uncertainty Stochastic partial differential equations Computational statistics parallel programming

来源：评论

学校读者我要写书评

暂无评论

Guest Editorial of the Special Issue on the 3rd IEEE International Conference on Digital Twins and parallel Intelligence (IEEE DTPI 2023)

IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION

引用

IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION 2024年 8卷 860-864页

作者： Wang, Yutong Wang, Xiao Bennett, Gisele Wang, Fei-Yue Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing 100190 Peoples R China Anhui Univ Sch Artificial Intelligence Hefei 230031 Peoples R China MEPSS LLC Indian Harbor Beach FL 32937 USA Chinese Acad Sci State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

来源：评论

学校读者我要写书评

暂无评论

Prediction and Correction of Software Defects in Message-Passing Interfaces Using a Static Analysis Tool and Machine Learning

引用

IEEE ACCESS 2023年 11卷 60668-60680页

作者： Al-Johany, Norah Abdullah Eassa, Fathy Elbouraey Sharaf, Sanaa Abdullah Noaman, Amin Y. Ahmed, Asaad King Abdulaziz Univ Fac Comp & Informat Technol Dept Comp Sci Jeddah 21589 Saudi Arabia

The Software Defect Prediction (SDP) method forecasts the occurrence of defects at the beginning of the software development process. Early fault detection will decrease the overall cost of software and improve its dependability. However, no effort has been made in high-performance software to address it. The contribution of this paper is predicting and correcting software defects in the Message Passing Interface (MPI) based on machine learning (ML). This system predicts defects including deadlock, race conditions, and mismatch, by dividing the model into three stages: training, testing, and prediction. The training phase extracts and combines the features as well as the label and then trains on classification. During the testing phase, these features are extracted and classified. The prediction phase inputs the MPI code and determines whether it includes defects. If it discovers a defect, the correction subsystem corrects it. We collected 40 MPI codes in C++, including all MPI communication. Results show the NB classifiers have high accuracy, precision, and recall, which are about 1.

关键词： ~High-performance computing software defect prediction semantic features message passing interface parallel programming

来源：评论

学校读者我要写书评

暂无评论

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第2-3期51卷 172-185页

作者： Herrmann, Nina Kuchen, Herbert Univ Munster Leonardo Campus 3 D-48149 Munster Germany

Contemporary HPC hardware typically provides several levels of parallelism, e.g. multiple nodes, each having multiple cores (possibly with vectorization) and accelerators. Efficiently programming such systems usually requires skills in combining several low-level frameworks such as MPI, OpenMP, and CUDA. This overburdens programmers without substantial parallel programming skills. One way to overcome this problem and to abstract from details of parallel programming is to use algorithmic skeletons. In the present paper, we evaluate the multi-node, multi-CPU and multi-GPU implementation of the most essential skeletons Map, Reduce, and Zip. Our main contribution is a discussion of the efficiency of using multiple parallelization levels and the consideration of which fine-tune settings should be offered to the user.

关键词： parallel programming Skeleton programming Heterogeneous computing environments High-level frameworks Usability

来源：评论

学校读者我要写书评

暂无评论

Simplifying non-contiguous data transfer with MPI for Python

引用

JOURNAL OF SUPERCOMPUTING 2023年第17期79卷 20019-20040页

作者： Noelp, Klaus Oden, Lena Fernuniv Tech Comp Sci Hagen Germany

Python is becoming increasingly popular in scientific computing. The package MPI for Python (mpi4py) allows writing efficient parallel programs that scale across multiple nodes. However, it does not support non-contiguous data via slices, which is a well-known feature of NumPy. In this work, we therefore evaluate several methods to support the direct transfer of non-contiguous arrays in mpi4py. This significantly simplifies the code, while the performance basically stays the same. In a PingPong-, Stencil- and Lattice-Boltzmann-Benchmark, we compare the common manual copying, a NumPy-Copy design and a design that is based on MPI derived datatypes. In one case, the MPI derived datatype design could achieve a speedup of 15% in a Stencil-Benchmark on four compute nodes. Our designs are superior to naive manual copies, but for maximum performance manual copies with pre-allocated buffers or MPI persistent communication will be a better choice.

关键词： mpi4py Strided memory Derived datatypes High-performance computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

An Imbalanced Dataset and Class Overlapping Classification Model for Big Data

引用

Computer Systems Science & Engineering 2023年第2期44卷 1009-1024页

作者： Mini Prince P.M.Joe Prathap Department of Information Technology St.Peter’s College of Engineering and TechnologyChennai600054TamilnaduIndia Department of Information Technology R.M.D Engineering CollegeChennai601206TamilnaduIndia

Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big *** big data is used in the real-world applications,two data challenges such as class overlap and class imbalance *** dealing with large datasets,most traditional classiﬁers are stuck in the local optimum *** a result,it’s necessary to look into new methods for dealing with large data *** solutions have been proposed for overcoming this *** rapid growth of the available data threatens to limit the usefulness of many traditional *** such as oversampling and undersampling have shown great promises in addressing the issues of class *** all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced *** issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of *** this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned *** proposed solution has been divided into three ***ﬁrst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced *** each map block,a decision tree model would be ***,the decision tree blocks would be com-bined for creating a classiﬁcation *** have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s *** a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.

关键词： Imbalanced dataset class overlapping SMOTE MapReduce parallel programming oversampling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：