The proceedings contain 38 papers. The special focus in this conference is on parallel and distributed Computing. The topics include: An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures...
ISBN:
(纸本)9783030856649
The proceedings contain 38 papers. The special focus in this conference is on parallel and distributed Computing. The topics include: An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures;pipelined Model parallelism: Complexity Results and Memory Considerations;efficient and Systematic Partitioning of Large and Deep Neural Networks for parallelization;A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks;Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs;smart distributed DataSets for Stream processing;colony: parallel Functions as a Service on the Cloud-Edge Continuum;horizontal Scaling in Cloud Using Contextual Bandits;geo-distribute Cloud applications at the Edge;Automatic Low-Overhead Load-Imbalance Detection in MPI applications;A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-Based Spot GPU Instances;sustaining Performance While Reducing Energy Consumption: A Control Theory Approach;algorithm Design for Tensor Units;a Scalable Approximation Algorithm for Weighted Longest Common Subsequence;TSLQueue: An Efficient Lock-Free Design for Priority Queues;G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU;accelerating Graph applications Using Phased Transactional Memory;Efficient GPU Computation Using Task Graph parallelism;towards High Performance Resilience Using Performance Portable Abstractions;Enhancing Load-Balancing of MPI applications with Workshare;trace-Based Workload Generation and Execution;particle-In-Cell Simulation Using Asynchronous Tasking;Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective;designing a 3D parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems;Fault-Tolerant LU Factorization Is Low Cost;Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs;GPU-Accelerated Mahalanobis-Average Hierarchical Clustering Analysis.
With the continuous improvement of the resolution of satellite remote sensing images and aerial remote sensing images, more and more useful data and information are obtained from remote sensing images. At the same tim...
详细信息
A large amount of stream data are generated from some devices such as sensors and cameras. These stream data should be timely processed for real-time applications to satisfy the data latency requirements. To process a...
详细信息
ISBN:
(纸本)9781728172835
A large amount of stream data are generated from some devices such as sensors and cameras. These stream data should be timely processed for real-time applications to satisfy the data latency requirements. To process a large amount of data in a short time, utilizing stream processing on edge/fog computing is a promising technology. In the stream processing system, a snapshot of processes and replications of the stream data are stored on another server, and when server fault or load spike of server occurs, the process is continued by using the stored snapshots and replicated data. Therefore, with edge computing environment, which has low bandwidth resource, process recovery takes a long time due to the transferring of restored data. In this paper, we propose a stream processing system architecture to decide servers to store snapshots and replication data and redeploy processes by considering the load of each server and the network bandwidth. We also propose a semi-optimal algorithm that reduces the computational cost by appropriately sorting servers and tasks according to the network bandwidth and server load. The algorithm can find a solution over 1000 times faster than the Coin or Branch and Cut (CBC) solver.
Emerging blockchain accounting mechanism allow mutually distributed parties to transport trusted information and ensure the correctness of data. Every blockchain node stores the complete block locally. Although this m...
详细信息
Although smart devices markets are increasing their sales figures, their computing capabilities are not sufficient to provide good-enough-quality services. This paper proposes a solution to organize the devices within...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
Although smart devices markets are increasing their sales figures, their computing capabilities are not sufficient to provide good-enough-quality services. This paper proposes a solution to organize the devices within the Cloud-Edge Continuum in such a way that each one, as an autonomous individual -Agent-, processes events/data on its embedded compute resources while offering its computing capacity to the rest of the infrastructure in a Function-as-a-Service manner. Unlike other FaaS solutions, the described approach proposes to transparently convert the logic of such functions into task-based workflows backing on task-based programming models;thus, agents hosting the execution of the method generate the corresponding workflow and offloading part of the workload onto other agents to improve the overall service performance. On our prototype, the function-to-workflow transformation is performed by COMPSs;thus, developers can efficiently code applications of any of the three envisaged computing scenarios - sense-process-actuate, streaming and batch processing - throughout the whole Cloud-Edge Continuum without struggling with different frameworks specifically designed for each of them.
Due to their fine-grained operations and low conflict rates, graph processing algorithms expose a large amount of parallelism that has been extensively exploited by various parallelization frameworks. Transactional Me...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
Due to their fine-grained operations and low conflict rates, graph processing algorithms expose a large amount of parallelism that has been extensively exploited by various parallelization frameworks. Transactional Memory (TM) is a programming model that uses an optimistic concurrency control mechanism to improve the performance of irregular applications, making it a perfect candidate to extract parallelism from graph-based programs. Although fast Hardware TM (HTM) instructions are now available in the ISA extensions of some major processor architectures (e.g., Intel and ARM), balancing the usage of Software TM (STM) and HTM to compensate for capacity and conflict aborts is still a challenging task. This paper presents a Phased TM implementation for graph applications, called Graph-Oriented Transactional Memory (GoTM). It uses a three-state (HTM, STM, GLOCK) concurrency control automaton that leverages both HTM and STM implementations to speed-up graph applications. Experimental results using seven well-known graph programs and real-life workloads show that GoTM can outperform other Phased TM systems and lock-based concurrency mechanisms such as the one present in Galois, a state-of-the-art framework for graph computations.
This study evaluates the use of Quantum Convolutional Neural Networks (QCNNs) for identifying signals resembling Gamma-Ray Bursts (GRBs) within simulated astrophysical datasets in the form of light curves. The task ad...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
This study evaluates the use of Quantum Convolutional Neural Networks (QCNNs) for identifying signals resembling Gamma-Ray Bursts (GRBs) within simulated astrophysical datasets in the form of light curves. The task addressed here focuses on distinguishing GRB-like signals from background noise in simulated Cherenkov Telescope Array Observatory (CTAO) data, the next-generation astrophysical observatory for very high-energy gamma-ray science. QCNNs, a quantum counterpart of classical Convolutional Neural Networks (CNNs), leverage quantum principles to process and analyze high-dimensional data efficiently. We implemented a hybrid quantum-classical machine learning technique using the Qiskit framework, with the QCNNs trained on a quantum simulator. Several QCNN architectures were tested, employing different encoding methods such as Data Reuploading and Amplitude encoding. Key findings include that QCNNs achieved accuracy comparable to classical CNNs, often surpassing 90%, while using fewer parameters, potentially leading to more efficient models in terms of computational resources. A benchmark study further examined how hyperparameters like the number of qubits and encoding methods affected performance, with more qubits and advanced encoding methods generally enhancing accuracy but increasing complexity. QCNNs showed robust performance on time-series datasets, successfully detecting GRB signals with high precision. The research is a pioneering effort in applying QCNNs to astrophysics, offering insights into their potential and limitations. This work sets the stage for future investigations to fully realize the advantages of QCNNs in astrophysical data analysis.
The ever-increasing gap between the processor and main memory speeds requires careful utilization of the limited memory link. This is additionally emphasized for the case of memory-bound applications. Prioritization o...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
The ever-increasing gap between the processor and main memory speeds requires careful utilization of the limited memory link. This is additionally emphasized for the case of memory-bound applications. Prioritization of memory requests in the memory controller is one of the approaches to improve performance of such codes. However, current designs do not consider high-level information about parallelapplications. In this paper, we propose a holistic approach to this problem, where the runtime system-level knowledge is made available in hardware. Processor exploits this information to better prioritize memory requests, while introducing negligible hardware cost. Our design is based on the notion of critical path in the execution of a parallel code. The critical tasks are accelerated by prioritizing their memory requests within the on-chip memory hierarchy. As a result, we reduce the critical path and improve the overall performance up to 1.19 x compared to the baseline systems.
The increasing complexity of modern and future computing systems makes it challenging to develop applications that aim for maximum performance. Hybrid parallel programming models offer new ways to exploit the capabili...
详细信息
ISBN:
(纸本)9781728165820
The increasing complexity of modern and future computing systems makes it challenging to develop applications that aim for maximum performance. Hybrid parallel programming models offer new ways to exploit the capabilities of the underlying infrastructure. However, the performance gain is sometimes accompanied by increased programming complexity. We introduce an extension to PyCOMPSs, a high-level task-based parallel programming model for Python applications, to support tasks that use MPI natively as part of the task model. Without compromising application's programmability, using Native MPI tasks in PyCOMPSs offers up to 3x improvement in total performance for compute intensive applications and up to 1.9x improvement in total performance for 110 intensive applications over sequential implementation of the tasks.
HPC systems and parallelapplications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In thi...
详细信息
ISBN:
(纸本)9781728165820
HPC systems and parallelapplications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In this paper we describe a performance analysis method and we apply it to four complex HPC applications. We perform our study on a pre-production HPC system powered by the latest Arm-based CPUs for HPC, the Marvell ThunderX2. For each application we spot inefficiencies and factors that limit their scalability. The results show that in several cases the bottlenecks do not come from the hardware but from the way applications are programmed or the way the system software is configured.
暂无评论