检索结果-内蒙古大学图书馆

DSMC: Fast direct simulation Monte Carlo solver for the Boltzmann equation by Multi-Chain Markov Chain and multicore programming

引用

International Journal of Modeling, Simulation, and Scientific Computing 2016年第2期7卷 152-166页

作者： Di Zhao Haiwu He Computer Network Information Center Chinese Academy of Sciences Beijing 100190 P.R.China

Direct Simulation Monte Carlo(DSMC)solves the Boltzmann equation with large Knudsen *** Boltzmann equation generally consists of three terms:the force term,the diffusion term and the collision *** the first two terms of the Boltzmann equation can be discretized by numerical methods such as the finite volume method,the third term can be approximated by DSMC,and DSMC simulates the physical behaviors of gas ***,because of the low sampling efficiency of Monte Carlo Simulation in DSMC,this part usually occupies large portion of computational costs to solve the Boltzmann *** this paper,by Markov Chain Monte Carlo(MCMC)and multicore programming,we develop Direct Simulation Multi-Chain Markov Chain Monte Carlo(DSMC3):a fast solver to calculate the numerical solution for the Boltzmann *** results show that DSMC3 is significantly faster than the conventional method DSMC.

关键词： Fast solver direct simulation Multi-Chain Markov Chain Monte Carlo DSMC the Boltzmann equation multicore programming

来源：评论

学校读者我要写书评

暂无评论

Efficient multicore Computing and High-Precision Arithmetic: A Comprehensive Guide to multicore and Big Number programming 13

Efficient Multicore Computing and High-Precision Arithmetic:...

引用

13th IEEE International Conference on Communication Systems and Network Technologies, CSNT 2024

作者： Hasija, Taniya Ramkumar, K.R. Kaur, Amanpreet Mittal, Sudesh Kumar Singh, Bhupendra Chitkara University Institute of Engineering and Technology Chitkara University Punjab India Centre for Artificial Intelligence & Robotics Defence Research and Development Organization Bangalore India

ISBN: (纸本)9798350305463

Today system and application programming is moving toward concurrent and parallel programming with the development of multicore and multiprogramming architectures. In an effort to improve study performance, researchers are looking for more efficient methods to include multiprocessing and multicore programming into their simulation systems. This article provides an overview of multicore programming and illustrates how it can be implemented. The paper also focusing the limitations of primitive data types for diverse applications, especially in the context of computer systems. The article delves into the necessity of big numbers and arithmetic on a significant scale. Focusing on C programming, the article showcases the implementation of big numbers, providing scholars with a comprehensive understanding of the concept and its practical realization. © 2024 IEEE.

关键词： multicore programming

来源：评论

学校读者我要写书评

暂无评论

Panel summary: Finding safety in numbers new languages for safe multicore programming and modeling

Panel summary: Finding safety in numbers new languages for s...

引用

ACM SIGAda's Annual International Conference High Integrity Language Technology, HILT 2014

作者： Bocchino, Robert Matsakis, Niko Taft, Tucker Larson, Brian Seidewitz, Ed Jet Propulsion Laboratory United States Mozilla Research United States AdaCore United States Kansas State University United States Model Driven Solutions United States

ISBN: (纸本)9781450332170

This panel brings together designers of both traditional programming languages, and designers of behavioral specification languages for modeling systems, in each case with a concern for the challenges of multicore programming. Furthermore, several of these efforts have attempted to provide data-race-free programming models, so that multicore programmers need not be faced with the added burden of trying to debug race conditions on top of the existing challenges of building reliable systems. Copyright is held by the owner/author(s).

关键词： multicore programming

来源：评论

学校读者我要写书评

暂无评论

High performance additive manufacturing phase field simulation: Fortran Do Concurrent vs OpenMP

引用

COMPUTATIONAL MATERIALS SCIENCE 2025年 252卷

作者： Maqbool, Shahid Lee, Byeong-Joo Pohang Univ Sci & Technol Dept Mat Sci & Engn Pohang 37673 North Gyeongsan South Korea

Standard language parallelism is an alternate way to achieve the parallel performance of the code without using external application processing interface (API). In this work, we present the Fortran Do Concurrent standard language parallel feature for additive manufacturing. We developed an open-source AMSimulator application and have implemented OpenMP and Fortran Do Concurrent in the phase field simulation. Performance has been measured across various platforms like Windows 10 and Linux and open-source compilers with Intel and NVIDIA. We found that using standard language parallel features, the same performance can be achieved without the need of external API. This high-performance approach is useful for code development and portability across various platforms.

关键词： Phase field Dendrite formation Additive manufacturing Fortran multicore programming Standard language parallelism Do Concurrent OpenMP

来源：评论

学校读者我要写书评

暂无评论

WindFlow: High-Speed Continuous Stream Processing With Parallel Building Blocks

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2021年第11期32卷 2748-2763页

作者： Mencagli, Gabriele Torquati, Massimo Cardaci, Andrea Fais, Alessandra Rinaldi, Luca Danelutto, Marco Univ Pisa Dept Comp Sci I-56126 Pisa Italy Univ Pisa Dept Informat Engn I-56126 Pisa Italy

Nowadays, we are witnessing the diffusion of Stream Processing Systems (SPSs) able to analyze data streams in near realtime. Traditional SPSs like Storm and Flink target distributed clusters and adopt the continuous streaming model, where inputs are processed as soon as they are available while outputs are continuously emitted. Recently, there has been a great focus on SPSs for scale-up machines. Some of them (e.g., BriskStream) still use the continuous model to achieve low latency. Others optimize throughput with batching approaches that are, however, often inadequate to minimize latency for live-streaming applications. Our contribution is to show a novel software engineering approach to design the runtime system of SPSs targeting multicores, with the aim of providing a uniform solution able to optimize throughput and latency. The approach has a formal nature based on the assembly of components called building blocks, whose composition allows optimizations to be easily expressed in a compositional manner. We use this methodology to build a new SPS called WindFlow. Our evaluation showcases the benefits of WindFlow: it provides lower latency than SPSs for continuous streaming, and can be configured to optimize throughput, to perform similarly and even better than batch-based scale-up SPSs.

关键词： Runtime Throughput Libraries multicore processing Storms Algebra Semantics Data stream processing multicore programming parallel computing

来源：评论

学校读者我要写书评

暂无评论

Scalable Transactional Stream Processing on multicore Processors

引用

IEEE Transactions on Knowledge and Data Engineering 2025年

作者： Zhao, Jianjun Mao, Yancan Yang, Zhonghao Liu, Haikun Zhang, Shuhao Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan5430074 China National University of Singapore 119077 Singapore Nanyang Technological University 639798 Singapore

Transactional stream processing engines (TSPEs) are central to modern stream applications handling shared mutable states. However, their full potential, particularly in adaptive scheduling, remains largely unexplored. We present MorphStream, a TSPE designed to optimize parallelism and performance for transactional stream processing on multicores. Through a unique three-stage execution paradigm (i.e., planning, scheduling, and execution), MorphStream enables adaptive scheduling under varying workload characteristics. Building on this foundation, MorphStream is further enhanced with support for non-deterministic state access, employing a stateful task precedence graph to handle undefined read/write sets at runtime while guaranteeing transaction semantics. Additionally, MorphStream incorporates a generalized framework for managing window-based operations, enabling efficient tracking and maintenance of overlapping windows using multi-versioned state management. These extensions enhance the system's ability to process dynamic and irregular workloads. Experimental results demonstrate up to 3.4 times higher throughput and 69.1% lower latency compared to state-of-the-art TSPEs, validating its scalability and adaptability in real-world streaming scenarios. © 1989-2012 IEEE.

关键词： multicore programming

来源：评论

学校读者我要写书评

暂无评论

A Study of Performance Portability in Plasma Physics Simulations

arXiv

引用

arXiv 2024年

作者： Ruzicka, Josef Asch, Christian Meneses, Esteban Rampp, Markus Laure, Erwin National Advanced Computing Laboratory National High Technology Center San José Costa Rica School of Computing Costa Rica Institute of Technology Costa Rica Max Planck Computing and Data Facility Max Planck Society Garching Germany

The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized accelerators, vendors and tool developers back up the relentless progress of those architectures. In the context of scientific programming, it is fundamental to consider performance portability frameworks, i.e., software tools that allow programmers to write code once and run it on different computer architectures without sacrificing performance. We report here on the benefits and challenges of performance portability using a field-line tracing simulation and a particle-in-cell code, two relevant applications in computational plasma physics with applications to magnetically-confined nuclear-fusion energy research. For these applications we report performance results obtained on four HPC platforms with server-class CPUs from Intel (Xeon) and AMD (EPYC), and high-end GPUs from Nvidia and AMD, including the latest Nvidia H100 GPU and the novel AMD Instinct MI300A APU. Our results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent "out-of-the-box" performance, even for the very latest hardware platforms. For our applications, Kokkos provided performance portability to the broadest range of hardware architectures from different *** Codes 68Q85 © 2024, CC BY-SA.

关键词： multicore programming

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation on Work-Stealing Featured Parallel Programs on Asymmetric Performance multicore Processors

SSRN

引用

SSRN 2023年

作者： Adnan Dept. of Informatics Universitas Hasanuddin. Jl. Poros Malino Kab. Gowa Sulawesi Selatn Indonesia

HighlightsPerformance Evaluation on Work-stealing Featured Parallel Programs on Asymmetric Performance multicore ProcessorsThis paper reports the performance evaluation of the OpenCilk parallel program on asymmetric performance multicore processors (AMPs). The results show that AMPs do not impose performance penalties on the OpenCilk program. Hereafter, we derive the speedup factor $sf$ simply from the execution time measurement of both P-CPU and E-CPU. The speedup factor extension on Amdahl's law equation shows the OpenCilk parallel programs may exhibit sublinear or superlinear speedup depending on which CPU is the basis of performance reference. While using P-CPU sequential execution time as a reference causes a sublinear speedup, using E-CPU results in a superlinear speedup. © 2023, The Authors. All rights reserved.

关键词： multicore programming

来源：评论

学校读者我要写书评

暂无评论

Teaching Parallelism With Gamification in Cellular Automaton Environments

引用

IEEE REVISTA IBEROAMERICANA DE TECNOLOGIAS DEL APRENDIZAJE-IEEE RITA 2020年第1期15卷 34-42页

作者： Hardasmal, Antonio J. Tomeu Salguero, Alberto G. Univ Cadiz Dept Comp Sci Cadiz 11003 Spain

Parallel programming within the computer science degree is now mandatory. New hardware platforms, with multiple cores and the execution of concurrent threads, require it. Despite the above, the teaching of parallelism with the usual methods and classical algorithms, make this topic hard for our students to understand. On the other hand, teaching complex topics through the techniques of gamification has already demonstrated, in a reliable way, a positive reinforcement of the student in front of the learning of complex concepts. In this work we demonstrate a way to convey the teaching of parallelism to undergraduate students using gamification in microworlds. The results obtained by the students who followed this model, compared to a control group that followed the standard model, show a statistically significant advantage in favor of the teaching of parallelism, using a gamification with microworlds model.

关键词： Cellular automaton E-learning gamification microworlds multicore programming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Data Stream Processing for Packet-Level Analytics

引用

SENSORS 2021年第5期21卷 1735页

作者： Fais, Alessandra Lettieri, Giuseppe Procissi, Gregorio Giordano, Stefano Oppedisano, Francesco Univ Pisa Dipartimento Ingn Informaz I-56122 Pisa Italy

One of the most challenging tasks for network operators is implementing accurate per-packet monitoring, looking for signs of performance degradation, security threats, and so on. Upon critical event detection, corrective actions must be taken to keep the network running smoothly. Implementing this mechanism requires the analysis of packet streams in a real-time (or close to) fashion. In a softwarized network context, Stream Processing Systems (SPSs) can be adopted for this purpose. Recent solutions based on traditional SPSs, such as Storm and Flink, can support the definition of general complex queries, but they show poor performance at scale. To handle input data rates in the order of gigabits per seconds, programmable switch platforms are typically used, although they offer limited expressiveness. With the proposed approach, we intend to offer high performance and expressive power in a unified framework by solely relying on SPSs for multicores. Captured packets are translated into a proper tuple format, and network monitoring queries are applied to tuple streams. Packet analysis tasks are expressed as streaming pipelines, running on general-purpose programmable network devices, and a second stage of elaboration can process aggregated statistics from different devices. Experiments carried out with an example monitoring application show that the system is able to handle realistic traffic at a 10 Gb/s speed. The same application scales almost up to 20 Gb/s speed thanks to the simple optimizations of the underlying framework. Hence, the approach proves to be viable and calls for the investigation of more extensive optimizations to support more complex elaborations and higher data rates.

关键词： software defined networking packet-level analysis data stream processing multicore programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：