Thread-throttling and mapping strategies have been used together to make better use of hardware resources and improve the energy-delay product (EDP) of high-performance computing (HPC) systems. However, the design spa...
详细信息
ISBN:
(纸本)9781665414555
Thread-throttling and mapping strategies have been used together to make better use of hardware resources and improve the energy-delay product (EDP) of high-performance computing (HPC) systems. However, the design space exploration significantly grows with the increasing number of cores in those systems, making the task of finding the ideal number of active threads and allocating strategy a challenging task. On top of that, parallelapplications present various patterns, such as irregularity, unbalanced computations, or high rates of communications. Given these considerations, we propose ETTM, an EDP-aware thread-throttling and mapping optimization strategy that automatically finds an ideal combination of number of threads and thread mapping strategy. With the execution of eighteen well-known benchmarks on three multicore architectures, we show that EDP can be significantly improved when running applications with the solution found by EETM1 .
Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source P...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source Python package that brings advanced time series analysis techniques to the astronomical community, with a focus on high-energy astrophysics, but built on top of general-purpose classes and methods that are designed to be easily adapted and extended to other use cases. We describe the work being done to adapt Stingray to the analysis of large data archives. In particular, we measure the performance and scalability of Stingray and use parallel computing to speed up selected parts of the code.
This paper introduces ZEROTuNE, a novel cost model for parallel and distributed stream processing that can be used to effectively set initial parallelism degrees of streaming queries. Unlike existing models, which rel...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
This paper introduces ZEROTuNE, a novel cost model for parallel and distributed stream processing that can be used to effectively set initial parallelism degrees of streaming queries. Unlike existing models, which rely majorly on online learning statistics that are non-transferable, context-specific, and require extensive training, ZEROTuNE proposes data-efficient zero-shot learning techniques that enable very accurate cost predictions without having observed any query deployment. To overcome these challenges, we propose ZEROTuNE, a graph neural network architecture that can learn from the structural complexity of paralleldistributed stream processing systems, enabling them to adapt to unseen workloads and hardware configurations. In our experiments, we show when integrating ZEROTuNE in a distributed streaming system such as Apache Flink, we can accurately set the degree of parallelism, showing an average speed-up of around 5× in comparison to existing approaches.
The C++ language continually evolves through formal specifications established by its standards committee, proposing new features to maintain $\mathrm{C}++$ as a relevant programming language while improving usability...
详细信息
The C++ language continually evolves through formal specifications established by its standards committee, proposing new features to maintain $\mathrm{C}++$ as a relevant programming language while improving usability...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
The C++ language continually evolves through formal specifications established by its standards committee, proposing new features to maintain $\mathrm{C}++$ as a relevant programming language while improving usability, performance, and portability across platforms. With the addition of parallel Standard Template Library (STL) algorithms in C++17, programmers can now leverage parallelprocessing capabilities via vendor-neutral parallel execution policies. This study presents an adaptation of the NAS parallel Benchmarks (NPB)—a well-established suite of applications for evaluating parallel architectures-by porting its sequential C-style code to use C++ STL abstractions and performance-portable parallelism features. Our goals are to (1) assess the suitability of C++ STL for scientific applications like the ones in the NPB and (2) provide a comparative performance and portability of STL algorithms’ parallel execution policies across different multicore architectures (x86 and AArch64). Results indicate that the performance of parallel STL algorithms is often close to that of optimized handwritten versions (OpenMP, Intel TBB, and FastFlow) on different architectures, with notable shortfalls. Across all NPB benchmarks, the STL algorithms’ geometric mean shows sequential execution times that are between 3.76% and $\mathrm{6. 9 \%}$ higher, while parallel executions may reach a geometric mean of up to $\mathrm{2 1. 2 1 \%}$ higher execution time.
Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer-scale. Modern parallel programming frameworks...
详细信息
ISBN:
(纸本)9781665440660
Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer-scale. Modern parallel programming frameworks for Python present users with a more granular unit of management than traditional Unix processes and batch submissions: the Python function. We review the challenges involved in running native Python functions at scale, and present techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions. We evaluate these techniques in a range of environments, from campus cluster to supercomputer, and show that our advanced dependency management planning and dynamic resource management methods provide superior performance and utilization relative to coarser-grained management approaches, achieving several-fold decrease in execution time for several large Python applications.
parallel stream processing plays a crucial role in efficiently handling massive workloads for modern data-driven applications. However, understanding the performance implications of operator parallelism in distributed...
详细信息
ISBN:
(纸本)9798400715648
parallel stream processing plays a crucial role in efficiently handling massive workloads for modern data-driven applications. However, understanding the performance implications of operator parallelism in distributed environments remains a challenging task. Existing benchmarking systems primarily analyze Stream processing Systems (SPS) using sequential operator pipelines in homogeneous, centralized environments, limiting their applicability to real-world distributed scenarios. In this demo paper, we demonstrate PDSPBench, a benchmarking system designed to systematically evaluate and understand the performance of parallel stream processing in distributed and heterogeneous environments. PDSP-Bench enables the evaluation of how SPS, such as Apache Flink and Apache Storm, leverage operator parallelism and available resources to process diverse workloads. Our benchmark includes parallel query structures derived from 15 real-world applications and 9 synthetic queries, allowing users to explore the impact of parallelism on performance by varying query parameters, data stream characteristics, and parallelism enumeration strategies, including different configurations and resource allocations in cloud-cluster environments. Our demo allows users to experience performance benchmarking of SPS using PDSP-Bench by providing interactive visualizations and performance insights of parallelism on query execution in heterogeneous resource environments.
K-Means algorithm is one of the most common clustering algorithms widely applied in various data analysis applications. Yinyang K-Means algorithm is a popular enhanced K-Means algorithm that avoids most unnecessary ca...
详细信息
Meta-heuristic techniques have been popular in solving highly complex optimization problems. parallelprocessing is very important in signal representation and real-time analysis, as it helps lower the time required f...
详细信息
ISBN:
(数字)9798331542375
ISBN:
(纸本)9798331542382
Meta-heuristic techniques have been popular in solving highly complex optimization problems. parallelprocessing is very important in signal representation and real-time analysis, as it helps lower the time required for the data process so that algorithms can perform faster. The biggest challenge with the traditional parallelprocessingtechniques is load balancing and higher communication overhead, which may need to scale up to a larger extent. Meta-heuristic techniques reduce the complexity of problems that are NP-complete or NP-hard by providing satisfactory solutions and approximate answers to difficult ones. This has helped in parallelprocessing for signal representation and real-time analysis. parallelprocessing has also been useful in using meta-heuristic algorithms like genetic algorithms, ant colony optimization, and particle swarm optimization. Meta-heuristic algorithms produce flexible, adaptive mechanisms for solving complex problems.
In this day and age of widespread multimedia content, it is quite common to listen to a song, wish to identify it and also continue to listen to it in sync, even in the absence of the original sound source. However, t...
详细信息
暂无评论