the proceedings contain 22 papers. the topics discuss include: towards efficient OpenCL pipe specification for hardware accelerators;SimSYCL: a SYCL implementation targeting development, debugging, simulation and conf...
ISBN:
(纸本)9798400717901
the proceedings contain 22 papers. the topics discuss include: towards efficient OpenCL pipe specification for hardware accelerators;SimSYCL: a SYCL implementation targeting development, debugging, simulation and conformance;experiences with implementing Kokkos’ SYCL backend;optimization and evaluation of breadth first search with oneAPI/SYCL on Intel FPGAs: from describing algorithms to describing architectures;improving performance portability of the procedurally generated high energy physics event generator MadGraph using SYCL;unlocking performance portability on LUMI-G supercomputer: a virtual screening case study;evaluation of SYCL’s different data parallel kernels;smoothing the migration from CUDA to SYCL: SYCLomatic utility features;and optimization of fast Fourier transform (FFT) for Qualcomm Adreno graphics processing unit.
the proceedings contain 77 papers. the special focus in this conference is on parallelprocessing and Applied Mathematics. the topics include: Neural Nets with a Newton Conjugate Gradient Method on Mult...
ISBN:
(纸本)9783031304415
the proceedings contain 77 papers. the special focus in this conference is on parallelprocessing and Applied Mathematics. the topics include: Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs;Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-parallel Applications;Cost and Performance Analysis of MPI-Based SaaS on the Private Cloud Infrastructure;building a Fine-Grained Analytical Performance Model for Complex Scientific Simulations;evaluation of Machine Learning Techniques for Predicting Run Times of Scientific Workflow Jobs;Smart Clustering of HPC Applications Using Similar Job Detection Methods;distributed Work Stealing in a Task-Based Dataflow Runtime;task Scheduler for Heterogeneous Data Centres Based on Deep Reinforcement Learning;Shisha: Online Scheduling of CNN Pipelines on Heterogeneous architectures;General Framework for Deriving Reproducible Krylov Subspace algorithms: BiCGStab Case;proactive Task Offloading for Load Balancing in Iterative Applications;language Agnostic Approach for Unification of Implementation Variants for Different Computing Devices;high Performance Dataframes from parallelprocessing Patterns;global Access to Legacy Data-Sets in Multi-cloud Applications with Onedata;MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics algorithms;Breaking Down the parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software;GPU-Based Molecular Dynamics of Turbulent Liquid Flows with OpenMM;a Novel parallel Approach for Modeling the Dynamics of Aerodynamically Interacting Particles in Turbulent Flows;reliable Energy Measurement on Heterogeneous Systems–on–Chip Based Environments;distributed Objective Function Evaluation for Optimization of Radiation therapy Treatment Plans;a Generalized parallel Prefix Sums Algorithm for Arbitrary Size Arrays;GPU4SNN: GPU-Based Acceleration for Spiking Neural Network Simulations;Ant System Inspired Heuristic Optimization of UAVs Depl
the proceedings contain 77 papers. the special focus in this conference is on parallelprocessing and Applied Mathematics. the topics include: Neural Nets with a Newton Conjugate Gradient Method on Mult...
ISBN:
(纸本)9783031304446
the proceedings contain 77 papers. the special focus in this conference is on parallelprocessing and Applied Mathematics. the topics include: Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs;Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-parallel Applications;Cost and Performance Analysis of MPI-Based SaaS on the Private Cloud Infrastructure;building a Fine-Grained Analytical Performance Model for Complex Scientific Simulations;evaluation of Machine Learning Techniques for Predicting Run Times of Scientific Workflow Jobs;Smart Clustering of HPC Applications Using Similar Job Detection Methods;distributed Work Stealing in a Task-Based Dataflow Runtime;task Scheduler for Heterogeneous Data Centres Based on Deep Reinforcement Learning;Shisha: Online Scheduling of CNN Pipelines on Heterogeneous architectures;General Framework for Deriving Reproducible Krylov Subspace algorithms: BiCGStab Case;proactive Task Offloading for Load Balancing in Iterative Applications;language Agnostic Approach for Unification of Implementation Variants for Different Computing Devices;high Performance Dataframes from parallelprocessing Patterns;global Access to Legacy Data-Sets in Multi-cloud Applications with Onedata;MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics algorithms;Breaking Down the parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software;GPU-Based Molecular Dynamics of Turbulent Liquid Flows with OpenMM;a Novel parallel Approach for Modeling the Dynamics of Aerodynamically Interacting Particles in Turbulent Flows;reliable Energy Measurement on Heterogeneous Systems–on–Chip Based Environments;distributed Objective Function Evaluation for Optimization of Radiation therapy Treatment Plans;a Generalized parallel Prefix Sums Algorithm for Arbitrary Size Arrays;GPU4SNN: GPU-Based Acceleration for Spiking Neural Network Simulations;Ant System Inspired Heuristic Optimization of UAVs Depl
Effective and accurate detection of unmanned aerial vehicles (UAVs) is crucial for combating malicious UAV systems. However, adverse weather conditions, such as haze or low light, often degrade the quality of captured...
详细信息
Bandit Convex Optimization (BCO) is an imperative analysis framework when dealing with sequential decision-making problems. Considering to balance the computational cost and bounds of regrets, in this paper, we propos...
详细信息
In this paper, on the example of finite difference solving the problem of seismic wave propagation in elastic 3D media the features of the development and optimization of parallel codes for various multicore architect...
详细信息
Myers bit-vector algorithm for approximate string matching (ASM) is a dynamic programming based approach that takes advantage of bit-parallel operations. It is one of the fastest algorithms to find the edit distance b...
详细信息
ISBN:
(数字)9781728127828
ISBN:
(纸本)9781728127828
Myers bit-vector algorithm for approximate string matching (ASM) is a dynamic programming based approach that takes advantage of bit-parallel operations. It is one of the fastest algorithms to find the edit distance between two strings. In computational biology, ASM is used at various stages of the computational pipeline, including proteomics and genomics. the computationally intensive nature of the underlying algorithms for ASM operating on the large volume of data necessitates the acceleration of these algorithms. In this paper, we propose a novel ASM architecture based on Myers bit-vector algorithm for parallel searching of multiple query patterns in the biological databases. the proposed parallel architecture uses multiple processing engines and hardware/software codesign for an accelerated and energy-efficient design of ASM algorithm on hardware. In comparison with related literature, the proposed design achieves 22x better performance with a demonstrative energy efficiency of similar to 500x10(9) cell updates per joule.
Large language models have garnered significant attention and are widely utilized across different fields due to their impressive performance. However, centralized training of these models can pose privacy risks like ...
详细信息
For continuous variable quantum key distribution (CV-QKD) protocol, information reconciliation (IR) is a key step that can significantly affect the performance of CV-QKD. Because of multidimensional reconciliation sui...
详细信息
ISBN:
(数字)9798350356656
ISBN:
(纸本)9798350356663
For continuous variable quantum key distribution (CV-QKD) protocol, information reconciliation (IR) is a key step that can significantly affect the performance of CV-QKD. Because of multidimensional reconciliation suitable for low signal-to-noise ratio (SNR) and long transmission distance scenario, a parallel architecture for the multidimensional reconciliation in CV-QKD implemented on field programmable gate array (FPGA) is proposed in this paper. Using a binarized d-dimensional spherical vector c and a parallel computation of the coefficient a and rotation matrix M, the parallel architecture achieves 45.635 Mbps processingthroughput at 250 MHz in the system clock, with frame error rate (FER) less than 0.001 when the SNR exceeds 0.71 dB.
Bird9;s population is essential to ecology and conservation initiatives. However, proper categorization and counting in mixed-species flocks presents significant challenges because of the complexity due to increasi...
详细信息
暂无评论