The proceedings contain 21 papers. The topics discussed include: LALP: a novel language to program custom FPGA-based architectures;profiling general purpose GPU applications;accelerating Kirchhoff migration by CPU and...
ISBN:
(纸本)9780769538570
The proceedings contain 21 papers. The topics discussed include: LALP: a novel language to program custom FPGA-based architectures;profiling general purpose GPU applications;accelerating Kirchhoff migration by CPU and GPU cooperation;parallel LDPC decoding on a network-on-chip based multiprocessor platform;kD-tree traversal implementations for ray tracing on massive multiprocessors: a comparative study;analysis of performance dependencies in NUCA-based CMP systems;exploiting computational resources in distributed heterogeneous platforms;a paradigm change: from performance monitoring to performance analysis;performance and energy consumption evaluation of embedded applications: a method based on platform's behavioral model;TMT - A TLB tag management framework for virtualized platforms;composite confidence estimators for enhanced speculation control;and SPARC16: a new compression approach for the SPARC architecture.
The proceedings contain 32 papers. The topics discussed include: accelerating solution of generalized linear models by solving normal equation using GPGPU on a large real-world tall-skinny data set;performance analysi...
ISBN:
(纸本)9781728141947
The proceedings contain 32 papers. The topics discussed include: accelerating solution of generalized linear models by solving normal equation using GPGPU on a large real-world tall-skinny data set;performance analysis and optimization of automotive GPUs;non-uniform partitioning for collaborative execution on heterogeneous architectures;efficiency and scalability of multi-lane capsule networks (MLCN);towards a transprecision polymorphic floating-point unit for mixed-precision computing;and Monte-Carlo tree search and reinforcement learning for reconfiguring data stream processing on edge computing.
The proceedings contain 12 papers. The topics discussed include: ring pipelined algorithm for the algebraic path problem on the CELL broadband engine;performance evaluation of optimized implementations of finite diffe...
ISBN:
(纸本)9780769542768
The proceedings contain 12 papers. The topics discussed include: ring pipelined algorithm for the algebraic path problem on the CELL broadband engine;performance evaluation of optimized implementations of finite difference method for wave propagation problems on GPU architecture;exploring data streaming to improve 3D FFT implementation on multiple GPUs;effective dynamic scheduling on heterogeneous multi/manycore desktop platforms;towards a power-aware application level scheduler for a multithreaded runtime environment;I/O performance evaluation on multicore clusters with atmospheric model environment;OpenMP-based parallel algorithms for solving Kronecker descriptors;parallel implementations of an immune network model using POSIX threads and OpenMP;and parallel implementation of a computational model of the HIS using OpenMP and MPI.
The proceedings contain 118 papers. The topics discussed include: ChameleonEC: exploiting tunability of erasure coding for low-interference repair;DPUaudit: DPU-assisted pull-based architecture for near-zero cost syst...
ISBN:
(纸本)9798331506476
The proceedings contain 118 papers. The topics discussed include: ChameleonEC: exploiting tunability of erasure coding for low-interference repair;DPUaudit: DPU-assisted pull-based architecture for near-zero cost system auditing;delinquent loop pre-execution using predicated helper threads;architecting value prediction around in-order execution;efficient optimization with encoded Ising models;LegoZK: a dynamically reconfigurable accelerator for zero-knowledge proof;reuse-aware compilation for zoned quantum architectures based on neutral atoms;HATT: Hamiltonian adaptive ternary tree for optimizing fermion-to-qubit mapping;QuCLEAR: Clifford extraction and absorption for quantum circuit optimization;and gaze into the pattern: characterizing spatial patterns with internal temporal correlations for hardware prefetching.
In high-performancecomputing (HPC), multi-threaded applications using OpenMP face complex challenges in identifying hidden performance issues, often due to resource conflicts, software inefficiencies, and hardware an...
详细信息
ISBN:
(纸本)9783031814037;9783031814044
In high-performancecomputing (HPC), multi-threaded applications using OpenMP face complex challenges in identifying hidden performance issues, often due to resource conflicts, software inefficiencies, and hardware anomalies. These subtle issues can significantly degrade performance and reduce system reliability. This paper introduces an innovative approach designed to address these concealed issues in OpenMP multi-threaded applications. The proposed method integrates a Random Forest classifier with anthropomorphic diagnosis to effectively identify and diagnose performance-affecting problems. The approach has demonstrated a remarkable ability to detect 90% of performance-affecting issues that are often obscured within complex HPC environments.
The proceedings contain 14 papers. The special focus in this conference is on Applied Reconfigurable computing. The topics include: An MLIR-Based Compilation Framework for CGRA Application Deployment;Hardware-Acc...
ISBN:
(纸本)9783031879944
The proceedings contain 14 papers. The special focus in this conference is on Applied Reconfigurable computing. The topics include: An MLIR-Based Compilation Framework for CGRA Application Deployment;Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA;RePAIR: Reconfigurable Platform for AI Resilience Within RISC-V Ecosystem;ROBoost: A study of FPGA Logic-Based Power-Wasting Primitives;FLARE: An FPGA-Based Universal Large Flow Detection Engine;Out-of-the-Box performance of FPGAs for ML Workloads Using Vitis AI;A Heterogeneous Embedded Platform for AI-Based Protocol Identification;Counting Heavy Items in Filtered Data streams Using an HLS-Generated FPGA Kernel;Ultra-Low Latency and Extreme-Throughput Echo state Neural Networks on FPGA;A Reconfigurable stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks;Real-Time Multi-object Tracking Using YOLOv8 and SORT on a SoC FPGA;Dynamic Function Exchange in FPGA to Redefine RISC-V Multicore architectures at Runtime.
作者:
Cabello, Julia GarciaCarbo-Garcia, S.Univ Granada
Andalusian Res Inst Data Sci & Computat Intellige Dept Appl Math Granada Spain Univ Granada
Andalusian Res Inst Data Sci & Computat Intellige Dept Comp Sci & Artificial Intelligence Granada Spain
In recent years, the architecture and structure of Deep Neural Networks (DNNs) have become progressively more complex in order to respond to the increasing complexity of real problems. A strategy to deal with this com...
详细信息
ISBN:
(纸本)9783031820724;9783031820731
In recent years, the architecture and structure of Deep Neural Networks (DNNs) have become progressively more complex in order to respond to the increasing complexity of real problems. A strategy to deal with this complexity when it affects training would be to partition DNN training in some way: for example, by distributing it among different components of a computer network. For this, training (which is in essence the minimization of the loss function) should be performed through separated "smaller pieces". This paper offers an alternative to the gradient-based DNN training from a Dynamic Programming (DP) point of view (DP is an optimisation methodology supported by the division of a complex problem into many problems of lower complexity). To do so, conditions which enable the DNN minimization algorithm to be solved under a DP perspective are studied here. In this line, in this work is proved that any artificial neural network ANN (and thus also DNNs) with monotonic activation is separable. Furthermore, whenever ANNs are considered as a dynamical system in the form of a network (known as coupled cell networks CCNs), we show that the transmission function is a separable function assuming that the activation is non-decreasing.
Server-based computing in space has been recently proposed due to potential benefits in terms of capability, latency, security, sustainability, and cost. Despite this, there has been no work asking the question: how s...
详细信息
Existing dead block predictors have proven to be effective in reducing cache leakage power of conventional systems. However, prior work is significantly less effective in energy harvesting systems in that it does not ...
暂无评论