Direct Simulation Monte Carlo(DSMC)solves the Boltzmann equation with large Knudsen *** Boltzmann equation generally consists of three terms:the force term,the diffusion term and the collision *** the first two terms ...
详细信息
Direct Simulation Monte Carlo(DSMC)solves the Boltzmann equation with large Knudsen *** Boltzmann equation generally consists of three terms:the force term,the diffusion term and the collision *** the first two terms of the Boltzmann equation can be discretized by numerical methods such as the finite volume method,the third term can be approximated by DSMC,and DSMC simulates the physical behaviors of gas ***,because of the low sampling efficiency of Monte Carlo Simulation in DSMC,this part usually occupies large portion of computational costs to solve the Boltzmann *** this paper,by Markov Chain Monte Carlo(MCMC)and multicore programming,we develop Direct Simulation Multi-Chain Markov Chain Monte Carlo(DSMC3):a fast solver to calculate the numerical solution for the Boltzmann *** results show that DSMC3 is significantly faster than the conventional method DSMC.
Today system and application programming is moving toward concurrent and parallel programming with the development of multicore and multiprogramming architectures. In an effort to improve study performance, researcher...
详细信息
This panel brings together designers of both traditional programming languages, and designers of behavioral specification languages for modeling systems, in each case with a concern for the challenges of multicore pro...
详细信息
ISBN:
(纸本)9781450332170
This panel brings together designers of both traditional programming languages, and designers of behavioral specification languages for modeling systems, in each case with a concern for the challenges of multicore programming. Furthermore, several of these efforts have attempted to provide data-race-free programming models, so that multicore programmers need not be faced with the added burden of trying to debug race conditions on top of the existing challenges of building reliable systems. Copyright is held by the owner/author(s).
Standard language parallelism is an alternate way to achieve the parallel performance of the code without using external application processing interface (API). In this work, we present the Fortran Do Concurrent stand...
详细信息
Standard language parallelism is an alternate way to achieve the parallel performance of the code without using external application processing interface (API). In this work, we present the Fortran Do Concurrent standard language parallel feature for additive manufacturing. We developed an open-source AMSimulator application and have implemented OpenMP and Fortran Do Concurrent in the phase field simulation. Performance has been measured across various platforms like Windows 10 and Linux and open-source compilers with Intel and NVIDIA. We found that using standard language parallel features, the same performance can be achieved without the need of external API. This high-performance approach is useful for code development and portability across various platforms.
Nowadays, we are witnessing the diffusion of Stream Processing Systems (SPSs) able to analyze data streams in near realtime. Traditional SPSs like Storm and Flink target distributed clusters and adopt the continuous s...
详细信息
Nowadays, we are witnessing the diffusion of Stream Processing Systems (SPSs) able to analyze data streams in near realtime. Traditional SPSs like Storm and Flink target distributed clusters and adopt the continuous streaming model, where inputs are processed as soon as they are available while outputs are continuously emitted. Recently, there has been a great focus on SPSs for scale-up machines. Some of them (e.g., BriskStream) still use the continuous model to achieve low latency. Others optimize throughput with batching approaches that are, however, often inadequate to minimize latency for live-streaming applications. Our contribution is to show a novel software engineering approach to design the runtime system of SPSs targeting multicores, with the aim of providing a uniform solution able to optimize throughput and latency. The approach has a formal nature based on the assembly of components called building blocks, whose composition allows optimizations to be easily expressed in a compositional manner. We use this methodology to build a new SPS called WindFlow. Our evaluation showcases the benefits of WindFlow: it provides lower latency than SPSs for continuous streaming, and can be configured to optimize throughput, to perform similarly and even better than batch-based scale-up SPSs.
Transactional stream processing engines (TSPEs) are central to modern stream applications handling shared mutable states. However, their full potential, particularly in adaptive scheduling, remains largely unexplored....
详细信息
The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized...
详细信息
作者:
AdnanDept. of Informatics
Universitas Hasanuddin. Jl. Poros Malino Kab. Gowa Sulawesi Selatn Indonesia
HighlightsPerformance Evaluation on Work-stealing Featured Parallel Programs on Asymmetric Performance multicore ProcessorsThis paper reports the performance evaluation of the OpenCilk parallel program on asymmetric p...
详细信息
Parallel programming within the computer science degree is now mandatory. New hardware platforms, with multiple cores and the execution of concurrent threads, require it. Despite the above, the teaching of parallelism...
详细信息
Parallel programming within the computer science degree is now mandatory. New hardware platforms, with multiple cores and the execution of concurrent threads, require it. Despite the above, the teaching of parallelism with the usual methods and classical algorithms, make this topic hard for our students to understand. On the other hand, teaching complex topics through the techniques of gamification has already demonstrated, in a reliable way, a positive reinforcement of the student in front of the learning of complex concepts. In this work we demonstrate a way to convey the teaching of parallelism to undergraduate students using gamification in microworlds. The results obtained by the students who followed this model, compared to a control group that followed the standard model, show a statistically significant advantage in favor of the teaching of parallelism, using a gamification with microworlds model.
One of the most challenging tasks for network operators is implementing accurate per-packet monitoring, looking for signs of performance degradation, security threats, and so on. Upon critical event detection, correct...
详细信息
One of the most challenging tasks for network operators is implementing accurate per-packet monitoring, looking for signs of performance degradation, security threats, and so on. Upon critical event detection, corrective actions must be taken to keep the network running smoothly. Implementing this mechanism requires the analysis of packet streams in a real-time (or close to) fashion. In a softwarized network context, Stream Processing Systems (SPSs) can be adopted for this purpose. Recent solutions based on traditional SPSs, such as Storm and Flink, can support the definition of general complex queries, but they show poor performance at scale. To handle input data rates in the order of gigabits per seconds, programmable switch platforms are typically used, although they offer limited expressiveness. With the proposed approach, we intend to offer high performance and expressive power in a unified framework by solely relying on SPSs for multicores. Captured packets are translated into a proper tuple format, and network monitoring queries are applied to tuple streams. Packet analysis tasks are expressed as streaming pipelines, running on general-purpose programmable network devices, and a second stage of elaboration can process aggregated statistics from different devices. Experiments carried out with an example monitoring application show that the system is able to handle realistic traffic at a 10 Gb/s speed. The same application scales almost up to 20 Gb/s speed thanks to the simple optimizations of the underlying framework. Hence, the approach proves to be viable and calls for the investigation of more extensive optimizations to support more complex elaborations and higher data rates.
暂无评论