A widely used computationally intensive scientific kernel, the matrix multiplication algorithm is at the heart of many scientific routines. Resurging fields, such as artificial intelligence (AI), strongly benefit from...
详细信息
High-performance computing (HPC) has transformed the capacity to address complex computational tasks across various scientific fields by enabling the efficient processing of large datasets and intricate simulations. I...
详细信息
ISBN:
(数字)9798331523893
ISBN:
(纸本)9798331523909
High-performance computing (HPC) has transformed the capacity to address complex computational tasks across various scientific fields by enabling the efficient processing of large datasets and intricate simulations. In hydrological modeling, a critical task is identifying the longest flow channel within a catchment, which is essential for understanding water flow patterns and managing resources. However, existing geographic information system (GIS) algorithms for flow path identification often suffer from inefficiencies and inaccuracies. To address these challenges, this paper introduces innovative parallel methods utilizing Open Multi-Processing (OpenMP), a widely-used API that supports multi-platform shared-memory parallelprogramming. This approach optimizes the analysis of flow direction data, resulting in faster and more accurate identification of flow channels. The results demonstrate that the proposed method outperforms current approaches, offering substantial improvements in both performance and precision. These advancements have the potential to significantly enhance hydrological modeling practices and water resource management.
The proceedings contain 40 papers. The topics discussed include: parallel real-time computation: sometimes quantity means quality;a parallel tabu search and its hybridization with genetic algorithm;reconfigurable mesh...
ISBN:
(纸本)0769509363
The proceedings contain 40 papers. The topics discussed include: parallel real-time computation: sometimes quantity means quality;a parallel tabu search and its hybridization with genetic algorithm;reconfigurable mesh-connected processor arrays using row-column bypassing and direct replacement;batched circuit switched routing for efficient service of requests;environment of multiprocessor simulator development;portable runtime support for graph-oriented parallel and distributed programming;comprehensive evaluation of an instruction reissue mechanism;an accurate analysis of reliability parameters in meshes with fault-tolerant adaptive routing;on the effect of link failures in fibre channel storage area networks;and an approximation algorithm for multiprocessor scheduling of trees with communication delays.
To address the limitations of energy-efficient but computationally limited ARM64-based AI edge devices and general-purpose edge servers, this paper proposes a Decentralized Collaborative Heterogeneous Edge Computing (...
详细信息
The proceedings contains 92 papers from the 1996 international symposium on parallel architectures, algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computer...
详细信息
The proceedings contains 92 papers from the 1996 international symposium on parallel architectures, algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computers;multistage interconnection networks;Banyan switching fabrics;internetworking;transmission control protocol/Internet protocol networks;train traffic and event driven simulations;universal broadband network access devices;customer premises networks;and parallel random access machines.
The proceedings contain 52 papers. The topics discussed include: data structures for one-dimensional packet classification using most-specific-rule matching;a self-routing topology for Bluetooth scatternets;computatio...
ISBN:
(纸本)0769515797
The proceedings contain 52 papers. The topics discussed include: data structures for one-dimensional packet classification using most-specific-rule matching;a self-routing topology for Bluetooth scatternets;computational aspects of distributed sensor networks;parallel hybrid adventures with simulated annealing and genetic algorithms;an algorithm for resolving the join component selection problem in parallel join optimization;automatic processor lower bound formulas for array computations;online real-time job scheduling with rate of progress guarantees;a survey on leader election protocols for radio networks;wireless multimedia networks;on locality of dominating set in ad hoc networks with switch-on/off operations;and energy efficient adaptation of multicast protocols in power controlled wireless ad hoc networks.
The proceedings contain 78 papers. The topics discussed include: a new general purpose parallel database system;a parallel pipelined renderer for time-varying volume data;wavelength division multiple access ring-virtu...
The proceedings contain 78 papers. The topics discussed include: a new general purpose parallel database system;a parallel pipelined renderer for time-varying volume data;wavelength division multiple access ring-virtual topology on a simple ring network;features of optical interconnects in distributed-shared memory organized MIMD architectures: the ultimate goal;optimal realization of hypercubes by three-dimensional space-invariant optical interconnections;critical sections and producer/consumer queues in weak memory systems;a scalable cache coherent architecture for large-scale mesh-connected multiprocessors;exploiting global data locality in non-blocking multithreaded architectures;an implementable dynamic automatic self-stabilizing protocol;cover model: a framework for design and execution of distributed applications;a new shortest path routing algorithm and embedding cycles of crossed cube;algorithms and lower bounds for p-gossiping in circulant networks;and fiber-ribbon pipeline ring network for high-performance distributed computing systems.
Integer Linear programming (ILP) is an important mathematical approach for solving time-sensitive real-life optimization problems, including network routing, map routing, traffic scheduling, etc. However, the algorith...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
Integer Linear programming (ILP) is an important mathematical approach for solving time-sensitive real-life optimization problems, including network routing, map routing, traffic scheduling, etc. However, the algorithms for solving ILPs are typically sparse and branch-intensive, and not CPU/GPU friendly. In the paper “What could a million cores do to solve Integer programs”, Koch et al. [40] presented data illustrating that Integer Linear programming (ILP) applications take tens of hours of execution time even on the largest parallel computers. Long execution time is a problem because many real-life applications need a decision in seconds or minutes. The widely used ILP solvers, like Gurobi (optimized for CPUs), perform software-based optimizations to handle the inherent sparsity in ILPs but still do not meet decision threshold because of the limited throughput of CPUs. GPUs are suited for large-sized dot-product compute, however, GPU-based ILP solvers also do not meet decision thresholds as (i) GPU is not sparsity friendly and (ii) GPU incurs thread divergence for branching, resulting in under-utilization of streaming engines and periodic host-GPU interaction. We propose SPARK, a sparsity-aware, reuse-aware, energy-efficient, reconfigurable, near-cache ILP architecture that (i) re-configures the existing L1 cache present in CPUs to perform near-cache acceleration with easy integration into the baseline CPU pipeline with minimal area overhead ($\sim 1.4 \%$ of a CPU), (ii) performs near-cache sparsity detection and sparsity-aware compute, reducing the number of insignificant computations, and data movement energy overheads, (iii) leverages the computational patterns present in algorithms used for solving ILP to realize a reuse-aware architecture, and (iv) is applicable to solving sparse and dense ILPs and LPs (Linear Programs). We observe $15 x / 20 x$, and $152 x / 740 x$ performance/energy improvement over AMD’s Zen3 CPU, and Nvidia’s Tesla v100 GPU for sparse rea
High-performance System-on-Chip (SoC) architectures are becoming increasingly complex and heterogeneous, and the days when a single application could utilize all of an SoC’s hardware resources are all but over. The ...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
High-performance System-on-Chip (SoC) architectures are becoming increasingly complex and heterogeneous, and the days when a single application could utilize all of an SoC’s hardware resources are all but over. The SoC’s workload, i.e., the set of independent applications that the SoC typically executes, therefore has a significant impact on its efficiency. Accounting for Workload-Level parallelism (WLP) in early-stage design space exploration is thus critical as later-stage analysis steps must focus on favorable design points to yield optimal results. Unfortunately, state-of-the-art MultiAmdahl and Gables fall short because they only model the extremes of minimal and maximal WLP. We hence propose HILP, the first early-stage design space exploration approach for heterogeneous SoCs that fully accounts for WLP. Our key observation is that scheduling a workload of independent multi-phase applications on a heterogeneous SoC is an instance of the classic job-shop scheduling optimization problem and thus can be solved using integer linear programming. HILP therefore uses a high-performance integer linear programming solver to find a near-optimal schedule that minimizes the overall execution time of the workload, i.e., it schedules the dependent phases of all applications in the workload on the cores and accelerators of the target SoC to maximize performance while respecting power consumption and memory bandwidth constraints. We validate HILP by demonstrating that it captures the performance effects of Amdahl’s law, the memory wall, and dark silicon, and then use it to explore the impact of WLP across a large SoC design space, yielding multiple insights. The key takeaway is that modeling WLP is necessary to ensure that more detailed, later-stage design tasks focus on the most favorable parts of the vast design space of heterogeneous SoCs.
暂无评论