Training large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies hav...
详细信息
Accurate wait-time prediction for HPC jobs contributes to a positive user experience but has historically been a challenging task. Previous models lack the accuracy needed for confident predictions, and many were deve...
详细信息
Message aggregation is widely used with a goal to reduce communication cost in HPC applications. The difference in the order of overhead of sending a message and cost of per byte transferred motivates the need for mes...
详细信息
Large-scale Computational Fluid Dynamics (CFD) simulations are typical HPC applications that require both high memory bandwidth and large memory capacity. However, it is difficult to achieve highperformance for such ...
详细信息
The Triton Shared computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center"in the remaining text)'s primary campus research computing system. This paper describes the transition from T...
详细信息
We present a randomized differential testing approach to test OpenMP implementations. In contrast to previous work that manually creates dozens of verification and validation tests, our approach is able to randomly ge...
详细信息
This paper proposes a scalable and efficient architecture to accelerate random forest computation on FPGA devices targeting edge computing platforms. The proposed architecture with efficient decision tree units (DTUs)...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
This paper proposes a scalable and efficient architecture to accelerate random forest computation on FPGA devices targeting edge computing platforms. The proposed architecture with efficient decision tree units (DTUs) executes samples in a pipeline model for improving performance. Moreover, a size-effective memory organization is also introduced with the architecture to save the on-chip block ram used for reducing the latency and improving working frequency of the implementation system on FPGA devices. We target edge computing platforms that suffer from the limitations of resources and power consumption. Therefore, the proposed architecture can reconfigure the number of DTUs according to the target platform's available resources. We build a system with a PYNQ Z2 FPGA board for testing, validating, and estimating the proposed architecture. In this system, we exploit different numbers of DTUs, from 1 to 15, to test our scalability. Experimental results with certified datasets show that we achieve speed-ups by up to 170.39x and 90.27x compared to Intel core i7 desktop version and core i9 high-performancecomputing version processors, respectively.
The availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the a...
详细信息
ISBN:
(纸本)9798350381603
The availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the advantages of cost savings and scalable/elastic resource allocation. Allocating more powerful hardware and exclusivity allocating resources such as memory, storage, and CPU can improve performance in the cloud. For network interconnection, significant noise, and other inferences are generated by several simultaneous instances (multitenants) communicating using the same network. As increasing the network bandwidth may be an alternative, we designed an evaluation model, and performance analysis of NIC aggregation approaches in containerized private clouds. The experiments using NAS Parallel Benchmarks revealed that NIC aggregation approach outperforms the baseline up to similar to 98% of the executions with applications characterized by intensive network use. Also, the Balance Round-Robin aggregation mode performed better than the 802.3ad aggregation mode in most assessments.
The proceedings contain 226 papers. The topics discussed include: analyzing HPC utilization with PIKA and Vampir;portable cross-facility workflows for X-ray ptychography;towards sustainable post-exascale leadership co...
ISBN:
(纸本)9798350355543
The proceedings contain 226 papers. The topics discussed include: analyzing HPC utilization with PIKA and Vampir;portable cross-facility workflows for X-ray ptychography;towards sustainable post-exascale leadership computing;SANReN’s 100 Gbps data transfer service: transferring data fast!;framework for integrating machine learning methods for path-aware source routing;an Ising-based decision method for intra prediction mode in video coding;LLM-inference-bench: inference benchmarking of large language models on AI accelerators;ActorProf: a framework for profiling and visualizing fine-grained asynchronous bulk synchronous parallel execution;LIDC: a location independent multi-cluster computing framework for data intensive science;and Parsl+CWL: towards combining the Python and CWL ecosystems.
This paper introduces an efficient emotion detection method to integrate wearable and affective computing paradigms. Our research contributes to advancing emotion detection technologies, offering potential application...
详细信息
ISBN:
(纸本)9798350394665;9798350394672
This paper introduces an efficient emotion detection method to integrate wearable and affective computing paradigms. Our research contributes to advancing emotion detection technologies, offering potential applications in diverse domains such as healthcare, human-computer interaction, and personalized computing experiences. Our approach addresses the increasing need for real-time emotion recognition while minimizing computational demands. By leveraging low-computation techniques, we propose a novel framework that achieves high accuracy in emotion detection. Besides, advanced data abstraction methods are developed to reduce data workload keeping detection performance. Experimental results demonstrate a notable accuracy rate of 89.77%, affirming the efficacy of our proposed method.
暂无评论