The proceedings contain 239 papers. The topics discussed include: characterizing heterogeneous computing environments using singular value decomposition;statistical predictors of computing power in heterogeneous clust...
ISBN:
(纸本)9781424465347
The proceedings contain 239 papers. The topics discussed include: characterizing heterogeneous computing environments using singular value decomposition;statistical predictors of computing power in heterogeneous clusters;a first step to the evaluation of SimGrid in the context of a real application;dynamic adaptation of DAGs with uncertain execution times in heterogeneous computing systems;robust resource allocation of DAGs in a heterogeneous multicore system;decentralized dynamic scheduling across heterogeneous multi-core desktop grids;a configurable-hardware document-similarity classifier to detect web attacks;a configurable high-throughput linear sorter system;hardware implementation for scalable lookahead regular expression detection;reducing grid energy consumption through choice of resource allocation method;and scheduling parallel tasks on multiprocessor computers with efficient power management.
The proceedings contain 143 papers. The topics discussed include: bridging the gap between performance and bounds of Cholesky factorization on heterogeneous platforms;efficient message logging to support process repli...
ISBN:
(纸本)0769555101
The proceedings contain 143 papers. The topics discussed include: bridging the gap between performance and bounds of Cholesky factorization on heterogeneous platforms;efficient message logging to support process replicas in a volunteer computing environment;early multi-node performance evaluation of a knights corner (KNC) Based NASA supercomputer;mini-NOVA: a lightweight ARM-based virtualization microkernel supporting dynamic partial reconfiguration;real-time multiprocessor architecture for sharing stream processing accelerators;relocation-aware floorplanning for partially-reconfigurable FPGA-based systems;experiences with compiler support for processors with exposed pipelines;performance modeling of matrix multiplication on 3d memory integrated FPGA;enhancing speedups for FPGA accelerated SPICE through frequency scaling and precision reduction;and an automated high-level design framework for partially reconfigurable FPGAs.
The proceedings contain 128 papers. The topics discussed include: distributed advance network reservation with delay guarantees;a general algorithm for detecting faults under the comparison diagnosis model;broadcastin...
ISBN:
(纸本)9781424464432
The proceedings contain 128 papers. The topics discussed include: distributed advance network reservation with delay guarantees;a general algorithm for detecting faults under the comparison diagnosis model;broadcasting on large scale heterogeneous platforms under the bounded multi-port model;on the importance of bandwidth control mechanisms for scheduling on large scale heterogeneous platforms;scalable failure recovery for high-performance data aggregation;high performance comparison-based sorting algorithm on many-core GPUs;improving the performance of Uintah: a large-scale adaptive meshing computational framework;optimizing and tuning the fast multipole method for state-of-the-art multicore architectures;first experiences with congestion control in infiniband hardware;power-aware MPI task aggregation prediction for high-end computing systems;and a hybrid interest management mechanism for peer-to-peer networked virtual environments.
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly effi...
ISBN:
(纸本)9781665497473
The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly efficient ALLTOALL and ALLTOALLV communication algorithms for GPU systems;implementing spatio-temporal graph convolutional networks on graphcore IPUs;the best of many worlds: scheduling machine learning inference on CPU-GPU integrated architectures;online learning RTL synthesis for automated design space exploration;machine learning aided hardware resource estimation for FPGA DNN implementations;optimal schedules for high-level programming environments on FPGAs with constraint programming;on how to push efficient medical semantic segmentation to the edge: the SENECA approach;and exploiting high-bandwidth memory for FPGA-acceleration of inference on sum-product networks.
Gabor wavelets (GW) are widely adopted for feature extraction and representation, especially in medical image processing. Typically the GW method excels for text-based image analysis, including classification, segment...
详细信息
ISBN:
(数字)9798331509422
ISBN:
(纸本)9798331509439
Gabor wavelets (GW) are widely adopted for feature extraction and representation, especially in medical image processing. Typically the GW method excels for text-based image analysis, including classification, segmentation, and edge detection tasks. However, the performance of GW technique is limited by the hardware complexity, heavy computations, and large memory access, which seeks a large number of mapped physical resources and heavy compute-power. This paper introduces two highly generic and hardware-efficient multiplier architectures for Gabor image processingapplications. The proposed designs were implemented on Kintex-7 FPGA and on silicon using the gpdk-45 nm technology library. Comprehensive architectural implementations and comparisons were conducted against other state-of-the-art solutions. The proposed two designs: i) a parallel multiplexer-based multiplier (PMM), and ii) a serial multiplexer-based multiplier (SMM) demonstrates superior LUT savings of 30.43% and 82.66%, power reduction of 18% and 80% respectively over the most recent SOTA designs. The PMM and SMM designs present the maximum operating frequency of 370 MHz and 357 MHz and on silicon, with footprint savings of 50.07% and 89.70% respectively. The performance of the proposed architecture was evaluated on publicly available datasets, namely NEMA and OASIS. The hardware design files and characteristics are made freely available for further usage to researchers and designers community.
The proceedings contain 93 papers. The topics discussed include: earliest start time estimation for advance reservation-based resource brokering within computational grids;a web-based parallel file transferring system...
ISBN:
(纸本)9780769541907
The proceedings contain 93 papers. The topics discussed include: earliest start time estimation for advance reservation-based resource brokering within computational grids;a web-based parallel file transferring system on grid and cloud environments;scalable hierarchical scheduling for multiprocessor systems using adaptive feedback-driven policies;parallel numerical computing of finite element model of conductors and floating potentials;energy-efficient sink location service protocol to support mobile sinks in wireless sensor networks;depth balancing and multipath transmission algorithms for P2P streaming media using PeerCast;direct mapping OFDM-based transmission scheme for underwater acoustic multimedia;experimental analysis of coordination strategies to support wireless sensor networks composed by static ground sensors and UAV-carried sensors;and implement a RFID-based indoor location sensing system using virtual signal mechanism.
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with st...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs, causing inference clusters to consume large amounts of energy and, consequently, result in substantial carbon emissions. Fortunately, we find that there is an opportunity to improve energy efficiency by exploiting the heterogeneity in inference compute properties and the fluctuations in inference workloads. However, the diversity and dynamicity of these environments create a large search space, where different system configurations (e.g., number of instances, model parallelism, and GPU frequency) translate into different energy-performance trade-offs. To address these challenges, we propose DynamoLLM, the first energy-management framework for LLM inference environments. DynamoLLM automatically and dynamically reconfigures the inference cluster to optimize for energy of LLM serving under the services’ performance SLOs. We show that at a service level, on average, DynamoLLM conserves 52% of the energy and 38% of the operational carbon emissions, and reduces the cost to the customer by 61%, while meeting the latency SLOs.
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: t...
ISBN:
(纸本)9798350337662
The proceedings contain 95 papers. The topics discussed include: distributed sparse random projection trees for constructing K-nearest neighbor graphs;fast deterministic gathering with detection on arbitrary graphs: the power of many robots;accurate and efficient distributed covid-19 spread prediction based on a large-scale time-varying people mobility graph;accelerating packet processing in container overlay networks via packet-level parallelism;efficient hardware primitives for immediate memory reclamation in optimistic data structures;efficient hardware primitives for immediate memory reclamation in optimistic data structures;accelerating distributed deep learning training with compression assisted Allgather and reduce-scatter communication;accelerating CNN inference on long vector architectures via co-design;exploiting input tensor dynamics in activation checkpointing for efficient training on GPU;drill: log-based anomaly detection for large-scale storage systems using source code analysis;dynasparse: accelerating GNN inference through dynamic sparsity exploitation;exploiting sparsity in pruned neural networks to optimize large model training;SRC: mitigate I/O throughput degradation in network congestion control of disaggregated storage systems;boosting multi-block repair in cloud storage systems with wide-stripe erasure coding;on doorway egress by autonomous robots;and on the arithmetic intensity of distributed-memory dense matrix multiplication involving a symmetric input matrix (SYMM).
The proceedings contain 86 papers. The topics discussed include: self-stabilizing distributed algorithms for networks;feature extraction and coverage problems in distributed sensor networks;a self-stabilizing algorith...
ISBN:
(纸本)3540747419
The proceedings contain 86 papers. The topics discussed include: self-stabilizing distributed algorithms for networks;feature extraction and coverage problems in distributed sensor networks;a self-stabilizing algorithm for 3-edge-connectivity;architecture-based optimization for mapping scientific applications to imagine;implementation and optimization of sparse matrix-vector multiplication on imagine stream processor;a mutual exclusion algorithm for mobile agents-based applications;a distributed metaheuristic for solving a real-world scheduling-routing-loading problem;key-attributes based optimistic data consistency maintenance method;parallelization strategies for the points of interests algorithm on the cell processor;and RWA algorithm for scheduled lightpath demands in WDM networks.
暂无评论