the proceedings contain 7 papers. the topics discussed include: new challenges of benchmarking all-flash storage for HPC;understanding the I/O impact on the performance of high-throughput molecular docking;I/O bottlen...
ISBN:
(纸本)9781665418379
the proceedings contain 7 papers. the topics discussed include: new challenges of benchmarking all-flash storage for HPC;understanding the I/O impact on the performance of high-throughput molecular docking;I/O bottleneck detection and tuning: connecting the dots using interactive log analysis;data-aware storage tiering for deep learning;SCTuner: an autotuner addressing dynamic I/O needs on supercomputer I/O subsystems;user-centric system fault identification using IO500 benchmark;and verifying IO synchronization from MPI traces.
the proceedings contain 111 papers. the topics discussed include: a review of machine learning based recommendation approaches for cricket;operating of a drone using human intent recognition and characteristics of an ...
ISBN:
(纸本)9781728171326
the proceedings contain 111 papers. the topics discussed include: a review of machine learning based recommendation approaches for cricket;operating of a drone using human intent recognition and characteristics of an EEG Signal;role of Indian IT laws in smart healthcare devices in the intensive care unit in India;comparative analysis of different symmetric encryption techniques based on computation time;a study on analyzing the impact of feature selection on predictive machine learning algorithms;TCB minimization towards secured and lightweight IoT end device architecture using virtualization at fog node;a novel perspective to threat modelling using design thinking and agile principles;android malware detection using chi feature selection and ensemble learning method;and prediction and monitoring of air pollution using Internet of things.
Preface: the 6thinternationalconference on computing and applied Informatics, AIP conference Proceedings, Volume 2987, Issue 1, 19 April 2024, 010001, ht
Preface: the 6thinternationalconference on computing and applied Informatics, AIP conference Proceedings, Volume 2987, Issue 1, 19 April 2024, 010001, ht
Distributed databases are often used when scalability, fault tolerance, and high availability are crucial. they excel in scenarios where traditional, centralized databases may struggle to handle the increasing volume ...
详细信息
the exponential growth in the amount of data generated by genomic studies of genetic diseases reflects the rapid development of this field. the limitations of traditional on-premises computing resources, in terms of c...
详细信息
this paper proposes a novel parallel neighborhood expansion-based algorithm for graph edge partitioning aimed at addressing computational efficiency and scalability issues in large-scale graph data processing. the alg...
详细信息
the inherent high sparsity of the spiking neural networks (SNNs) and the low power consumption brought by the event-driven computing characteristics suit edge devices with extremely high energy efficiency requirements...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
the inherent high sparsity of the spiking neural networks (SNNs) and the low power consumption brought by the event-driven computing characteristics suit edge devices with extremely high energy efficiency requirements. On resource-constrained mobile devices, we also require memory saving. Unlike conventional artificial neural networks, SNNs are suitable for processing complex temporal data. However, computing in the time dimension requires repeated access to the data for multiple time steps, resulting in high energy consumption. We propose Temporally parallel Weight-Friendly (TPWF) dataflow, which reduces energy consumption through parallelcomputing across time steps. At the same time, considering the high sparsity of the spike event, this paper proposes a sparse aware strategy, which can realize high-energy-efficiency membrane potential accumulation calculation by the neuron burst weight search circuit. Furthermore, this paper proposes an efficient synaptic memory structure to reduce hardware resource usage while maintaining performance and network size. Use run- length encoding to record weights, realize synaptic connections that can support different configurations, such as sparse connection networks, and save a lot of memory. Using a fully connected 256-128-128-10 network to classify 16x16 MNIST training images achieves energy per synaptic operation (SOP) of 0.2pJ, up to 1.9x speedup, and 2x reduction in memory accesses.
In the current heterogeneous computing environment, different types of accelerated computing resources such as GPU(Graphics Processing Unit) and NPU(Neural Processing Unit) coexist in the cluster. However, because the...
详细信息
Deep learning hardware accelerators commonly incorporate a substantial quantity of multiplier units. Yet, the considerable complexity of multiplier circuits renders them a bottleneck, contributing to increased costs a...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
Deep learning hardware accelerators commonly incorporate a substantial quantity of multiplier units. Yet, the considerable complexity of multiplier circuits renders them a bottleneck, contributing to increased costs and latency. Approximate computing proves to be an effective strategy for mitigating the overhead associated with multipliers. this paper introduces an original approximation technique for signed multiplication on FPGAs. the approach involves a novel segmentation method applied to the Baugh-Wooley multiplication algorithm. Each segment is optimally accommodated within look-up table resources of modern AMD-Xilinx FPGA families. the paper details the design of an INT8 multiplier using the proposed approach, presenting implementation results and accuracy assessments for the inference of benchmark deep learning models. the implementation results reveal significant savings of 53.6% in LUT utilization compared to the standard INT8 Xilinx multiplier. Accuracy measurements conducted on four popular deep learning benchmarks show an average accuracy degradation of 4.8% in post-training deployment and 0.7% after retraining. the source code for this work is available on Github(1).
this paper proposes a parallel random number generator (RNG) using a single linear feedback shift register (LFSR) to generate two distinct random numbers, achieving twice the operational speed of a traditional serial ...
详细信息
暂无评论