检索结果-内蒙古大学图书馆

Automatic Load Balancing in the NORMA Language Compiler for Hybrid Architectures

Mathematical Models and Computer Simulations 2024年第Suppl 2期16卷 S209-S215页

作者： Bugerya, A. Gladkova, E. Efimkin, K. Keldysh Institute of Applied Mathematics Russian Academy of Sciences Moscow 125047 Russian Federation

Abstract: The issues of the automatic computations distributing in the translation of the NORMA language programs are considered in this paper. The load balancing between the nodes in multi node computer system is out of scope in this paper, the main focus is the load balancing inside a single node of a hybrid computing system between CPU and GPU(s). There are two methods proposed: a method for static distribution of computations and a method for automatic balancing of the computational load during program execution, which is based on periodic analysis of the CPU load by the program which is being executed at the moment and making decision whether redistribution of computational load is needed. The proposed methods are tested using applied program written in the NORMA language that solves a gas dynamic problem using the computing resources of the multicore central processor and graphics accelerators. The results of program execution with various data distributions were obtained and analyzed, both with and without the mechanism for automatic balancing of the computational load. © Pleiades Publishing, Ltd. 2024.

关键词： automatic program generation computational load balancing hybrid computing architectures NORMA language parallel programming

来源：评论

学校读者我要写书评

暂无评论

A parallel methodology using radial basis functions versus machine learning to environmental

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2022年第0期63卷

作者： Migallon, Violeta Navarro-Gonzalez, Francisco J. Penades, Hector Penades, Jose Villacampa, Yolanda Univ Alicante Dept Ciencia Computac & Inteligencia Artificial Alicante Spain Univ Alicante Dept Matemat Aplicada Alicante Spain

parallel nonlinear models using radial kernels on local mesh support have been designed and implemented for application to real-world problems. Although this recently developed approach reduces the memory requirements compared with other methodologies suggested over the last few years, its computational cost makes parallelisation necessary, especially for big datasets with many instances or attributes. In this work, several strategies for the parallelisation of this methodology are proposed and compared. The MPI commu-nication protocol and the OpenMP application programming interface are used to implement the algorithm. The performance of this methodology is compared with various machine learning methods, with particular consideration of techniques using radial basis functions (RBF). Different methods are applied to model the daily maximum air temperature from real meteorological data collected from the Agroclimatic Station Network of the Phytosanitary Alert and Information Network of Andalusia, an autonomous community of southern Spain. The obtained goodness-of-fit measures illustrate the effectiveness of this nonlinear methodology, and its training process is shown to be simpler than those of other powerful machine learning methods.

关键词： Numerical modelling parallel programming Radial basis function Machine learning

来源：评论

学校读者我要写书评

暂无评论

Clupiter: a Raspberry Pi mini-supercomputer for educational purposes

Clupiter: a Raspberry Pi mini-supercomputer for educational ...

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Alonso Rodríguez-Iglesias María J. Martín Juan Touriño Computer Architecture Group CITIC Universidade da Coruña A Coruña Spain

ISBN: (数字)9798350381993

ISBN: (纸本)9798350382006

The main objective of this work is to bring supercomputing and parallel processing closer to non-specialized audiences by building a Raspberry Pi cluster, called Clupiter, which emulates the operation of a supercomputer. It consists of eight Raspberry Pi devices interconnected to each other so that they can run jobs in parallel. To make it easier to show how it works, a web application has been developed. It allows launching parallel applications and accessing a monitoring system to see the resource usage when these applications are running. The NAS parallel Benchmarks (NPB) are used as demonstration applications. From this web application a couple of educational videos can also be accessed. They deal, in a very informative way, with the concepts of supercomputing and parallel programming.

关键词： Privacy parallel programming Source coding Scalability Benchmark testing parallel processing Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Processing Bone Fracture Image before and after Surgery using GPU

Processing Bone Fracture Image before and after Surgery usin...

引用

Biomedical Imaging and Sensing Conference

作者： Cadena, Luis Cadena, Franklin Albuja, Alberto Castillo, Patricio Cadena, Gustavo Univ Fuerzas Armadas ESPE Dept Elect & Elect Engn Av Gral Ruminahui S-N Sangolqui Ecuador Coll Juan Suarez Chacon Dept Sci Quito Ecuador Univ Tecn Particular Loja UTPL Chem & Exact Sci Dept Av Marcelino Champagnat S-N Loja Ecuador Univ Cent Ecuador Med Fac Iqu 132 Quito Ecuador

ISBN: (数字)9781510647206

ISBN: (纸本)9781510647206;9781510647190

A fracture is the solution of continuity of bone tissue in any bone of the body occurs as a result of excessive stress that exceeds bone resistance, ie is the consequence of a single or multiple overload and occurs in milliseconds. The development of magnetic resonance imaging and computerized tomography have made it possible to know and evaluate the different pathologies of the human being more accurately. Edge detection is a fundamental tool in image medical processing, particularly in the areas of feature detection, which aim at identifying points in a digital image at which the image has discontinuities. In order to improve the computing speed, was used parallel computing which support NVIDIA GPU. This work presents an improved methodology for processing bone fracture images before and after surgery using segmentation and graphic accelerator cards to help the medical specialist in the analysis and evaluation of the images.

关键词： medical image image analysis fracture bone detection segmentation parallel programming GPU

来源：评论

学校读者我要写书评

暂无评论

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2023年第27期35卷

作者： Ghiglio, Pietro Dolinsky, Uwe Goli, Mehdi Narasimhan, Kumudha Codeplay Software Ltd Edinburgh Scotland

The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, automotive, artificial intelligence, machine learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs and so forth, typically via OpenCL or CUDA backends. While accelerators have increased the performance of user applications significantly, employing CPU devices for further performance improvement is beneficial due to the significant presence of CPUs in existing data-centers. SYCL applications on CPUs, currently go through an OpenCL backend. Though an OpenCL backend is valuable in supporting accelerators, it may introduce additional overhead for CPUs since the host and device are the same. Overheads like a run-time compilation of the kernel, transferring of input/output memory to/from the OpenCL device, invoking the OpenCL kernel and so forth, may not be necessary when running on the CPU. While some of these overheads (such as data transfer) can be avoided by modifying the application, it can introduce disparity in the SYCL application's ability to achieve performance portability on other devices. In this article, we propose an alternate approach to running SYCL applications on CPUs. We bypass OpenCL and use a CPU-directed compilation flow, along with the integration of whole function vectorization to generate optimized host and device code together in the same translation unit. We compare the performance of our approach-the CPU-directed compilation flow, with an OpenCL backend for existing SYCL-based applications, with no code modification for BabelStream benchmark, Matmul from the ComputeCpp SDK, N-body simulation benchmarks and SYCL-BLAS (Aliaga et al. Proceedings of the 5th International Workshop on OpenCL;2017.), on CPUs from different vendors and architectures. We report a performance improvement of

关键词： compiler optimizations multi-cores parallel programming portability software acceleration standards SYCL

来源：评论

学校读者我要写书评

暂无评论

Component-based parallel programming for peta-scale particle simulations

Component-based parallel programming for peta-scale particle...

引用

8th International Joint conference on Software Technologies, ICSOFT 2013

作者： Cao, Xiaolin Mo, Zeyao Zhang, Aiqing Institute of Applied Physics and Computational Mathematics No. 2 East Fenghao Road Beijing China

ISBN: (纸本)9789898565686

A major parallel programming challenge in scientific computing is to hide parallel computing details of data distribution and communication. Component-based approaches are often used in practice to encapsulate these computer science details and shield them from domain experts. In this paper, we present our component-based parallel programming approach for large-scale particle simulations. Our approach encapsulates parallel computing details in parallel integrator components on top of a patch-based data structure in JASMIN infrastructure. It enables domain programmers to "think parallel, write sequential". They only need to assemble necessary components and write serial numerical kernels on a patch invoked by components. Using this approach, two real application programs have been developed to support the petascale simulations with billons of particles on tens of thousands of processor cores. Copyright © 2013 SCITEPRESS.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A System for Attack Graph Generation and Analysis

A System for Attack Graph Generation and Analysis

引用

作者： Li, Ming The University of Tulsa

学位级别：Ph.D., Doctor of Philosophy

Attack graphs (AGs) are graphical tools for security analysis of computer networks. They are especially useful in detecting the threats of multi-stage attacks on target networks. An AG is composed of nodes and directed edges. Nodes represent different network states, and directed edges represent the causal connections between these states. By reading AGs, we can acquire useful information such as if multiple attack paths exist between any two nodes, the shortest or the most likely paths, and the most valuable target in a network. Such information helps system administrators assess the relative importance of various elements in a network, allowing them to effectively allocate time and budget to patch vulnerabilities, and proactively defend against possible attacks. This research addresses two primary concerns with AGs: efficient AG generation and effective AG analysis. First, AG application faces the challenge of state space explosion. As modern networks continue to grow and more vulnerabilities are discovered, the data to be processed during AG generation for target networks increase exponentially, which requires efficient AG generators. We design AG generators for the RAGE AG model based on parallel programming and high performance computing (HPC). We optimize the performance of the parallel AG generators with respect to data structures, memory access patterns, and workload balance. We conduct comprehensive performance evaluation on different HPC hardware. The testing dataset includes synthetic AGs and AGs converted from directed acyclic graphs. The results verify that the parallel strategy realized on HPC hardware can effectively handle the scalability issue of AG generation. Next, for effective AG analysis, we explore AG structures and apply probability theory to extract the underlying information from AGs. In structure analysis, we implement three centrality concepts from network science to study the importance of nodes and edges in AGs. Based on the centrality

关键词： Attack graph GPU High performance computing Network security parallel programming

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Land-Cover Segmentation Using Accelerated Balanced Deep Embedded Clustering

引用

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2022年 19卷 1页

作者： Obeid, Ahmad Elfadel, Ibrahim M. Werghi, Naoufel Khalifa Univ Dept Elect & Comp Sci Abu Dhabi 127788 U Arab Emirates Khalifa Univ Ctr Autonomous Robot Syst KUCARS Abu Dhabi 127788 U Arab Emirates Khalifa Univ Ctr Cyber Phys Syst C2PS Abu Dhabi 127788 U Arab Emirates

In this letter, we address the issue of the automatic labeling of remote sensing datasets using a novel deep learning clustering algorithm. The proposed algorithm addresses the inherent susceptibility of the deep embedded clustering (DEC) algorithm to data imbalance using additional search and extraction steps. Furthermore, the proposed algorithm is highly parallelizable. A graphics processing unit (GPU) implementation is shown to achieve 40X to 2600X of performance speedup and improved clustering accuracy with respect to DEC and other clustering approaches.

关键词： Clustering algorithms Graphics processing units Remote sensing Sensors Instruction sets Deep learning Data mining Clustering deep learning parallel programming remote sensing

来源：评论

学校读者我要写书评

暂无评论

The OpenMP Cluster programming Model 22

The OpenMP Cluster Programming Model

引用

Workshop Proceedings of the 51st International Conference on parallel Processing

作者： Hervé Yviquel Marcio Pereira Emílio Francesquini Guilherme Valarini Gustavo Leite Pedro Rosso Rodrigo Ceccato Carla Cusihualpa Vitoria Dias Sandro Rigo Alan Souza Guido Araujo Institute of Computing UNICAMP Brazil Center for Mathematics Computing and Cognition UFABC Brazil CENPES PETROBRAS Brazil

ISBN: (纸本)9781450394451

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP’s offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.

关键词： Concurrent programming parallel programming HPC

来源：评论

学校读者我要写书评

暂无评论

Combining stream with data parallelism abstractions for multi-cores

引用

JOURNAL OF COMPUTER LANGUAGES 2022年 73卷

作者： Loff, Junior Hoffmann, Renato B. Griebler, Dalvan Fernandes, Luiz G. Pontifical Catholic Univ Rio do Grande Sul PUCRS Sch Technol BR-90619900 Porto Alegre RS Brazil Tres de Maio Fac SETREM Lab Adv Res Cloud Comp LARCC BR-98910 Tres De Maio Brazil

Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity.

关键词： programming language parallel programming Stream parallelism Data parallelism parallelism abstractions Stream processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：