the proceedings contain 3 papers. the topics discussed include: cost-efficient construction of performance models;using benchmarking and regression models for predicting CNN training time on a GPU;and benchmarking mac...
ISBN:
(纸本)9798400706455
the proceedings contain 3 papers. the topics discussed include: cost-efficient construction of performance models;using benchmarking and regression models for predicting CNN training time on a GPU;and benchmarking machine learning applications on heterogeneous architecture using reframe.
Cloud computing allows users to access large computing infrastructures quickly. In the highperformancecomputing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and re...
详细信息
ISBN:
(纸本)9798350381603
Cloud computing allows users to access large computing infrastructures quickly. In the highperformancecomputing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and research groups to use highly parallel infrastructures in the cloud. However, parallel runtime systems and software optimizations proposed over the years to improve the performance and scalability of HPC applications targeted traditional on-premise HPC clusters, where developers have direct access to the underlying hardware without any kind of virtualization. In this paper, we analyze the performance and scalability of HPC applications from the NAS Parallel Benchmarks suite when running on a virtualized HPC cluster built on top of Amazon Web Services (AWS), contrasting them withthe results obtained withthe same applications running on a traditional on-premise HPC cluster from Grid'5000. Our results show that CPU-bound applications achieve similar results in both platforms, whereas communication-bound applications may be impacted by the limited network bandwidth in the cloud. Cloud infrastructure demonstrated better performance under workloads with moderate communication and mediumsized messages.
Rotating machines, such as motors and pumps, are of crucial importance for industrial operations, but are prone to failure due to their increasing complexity. Condition-based monitoring and early fault diagnosis, espe...
详细信息
the important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven ...
详细信息
ISBN:
(纸本)9798350381603
the important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven technologies, optimizing the performance and effectiveness of these networks has become crucial. While numerous studies have achieved promising results in this field, the process of fine-tuning and identifying optimal architectures for specific problem domains remains a complex and resource-intensive task. As such, there is a pressing need to explore and evaluate techniques that can improve this optimization process, reducing costs and time-to-deployment while maximizing the overall performance of Neural Networks. this work focuses on evaluating the optimization process of NetAdpat for two neural networks on an Nvidia Jetson device. We observe a performance decay for the larger network when the algorithm tries to meet the latency constraint. Furthermore, we propose potential alternatives to optimize this tool. Particularly, we propose an alternative configuration search procedure that allows us to enhance the optimization process, achieving speedups of up to similar to 7x.
Building efficient large-scale quantum computers is a significant challenge due to limited qubit connectivities and noisy hardware operations. Transpilation is critical to ensure that quantum gates are on physically l...
详细信息
ISBN:
(纸本)9798350393132;9798350393149
Building efficient large-scale quantum computers is a significant challenge due to limited qubit connectivities and noisy hardware operations. Transpilation is critical to ensure that quantum gates are on physically linked qubits, while minimizing SWAP gates and simultaneously finding efficient decomposition into native basis gates. the goal of this multifaceted optimization step is typically to minimize circuit depth and to achieve the best possible execution fidelity. In this work, we propose MIRAGE, a collaborative design and transpilation approach to minimize SWAP gates while improving decomposition using mirror gates. Mirror gates utilize the same underlying physical interactions, but when their outputs are reversed, they realize a different or mirrored quantum operation. Given the recent attention to root iSWAP as a powerful basis gate with decomposition advantages over CNOT, we show how systems that implement the iSWAP family of gates can particularly benefit from mirror gates. Further, MIRAGE uses mirror gates to reduce routing pressure and reduce true circuit depth instead of just minimizing SWAPs. We explore the benefits of decomposition for root iSWAP and (4)root iSWAP using mirror gates, including both expanding Haar coverage and conducting a detailed fault rate analysis trading off circuit depth against approximate gate decomposition. We also describe a novel greedy approach accepting mirror substitution at different aggression levels within MIRAGE. For iSWAP systems that use square-lattice topologies, MIRAGE provides an average of 29.6% reduction in circuit depth by eliminating an average of 59.9% SWAP gates, with a relative decrease in infidelity of 28%. MIRAGE also improves circuit depth and decreases relative infidelity by 25% and 21% for CNOT-based and 23% and 19% SYC-based machines, respectively.
the availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the a...
详细信息
ISBN:
(纸本)9798350381603
the availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the advantages of cost savings and scalable/elastic resource allocation. Allocating more powerful hardware and exclusivity allocating resources such as memory, storage, and CPU can improve performance in the cloud. For network interconnection, significant noise, and other inferences are generated by several simultaneous instances (multitenants) communicating using the same network. As increasing the network bandwidth may be an alternative, we designed an evaluation model, and performance analysis of NIC aggregation approaches in containerized private clouds. the experiments using NAS Parallel Benchmarks revealed that NIC aggregation approach outperforms the baseline up to similar to 98% of the executions with applications characterized by intensive network use. Also, the Balance Round-Robin aggregation mode performed better than the 802.3ad aggregation mode in most assessments.
this paper explores the Serverless First strategy in cloud application development. Serverless computing has gained popularity due to its flexibility and scalability. In our work, we provide a systematic review of the...
详细信息
ISBN:
(纸本)9798350381603
this paper explores the Serverless First strategy in cloud application development. Serverless computing has gained popularity due to its flexibility and scalability. In our work, we provide a systematic review of the literature about the Serverless paradigm in cloud computing and an evaluation of the advantages of this approach by performing a comparative analysis among three ways for the implementation of an application: AWS Lambda, AWS Lambda with Chalice framework, and the traditional form using the Flask framework. the literature review results show the gains in scaling, cost reduction, and ease of maintenance achieved withthe Serverless First strategy. However, some limitations and challenges were also highlighted, such as the greater complexity of the environment, less control over resources, resource limitations imposed by the cloud provider, and difficulties in debugging and managing the infrastructure. the case study demonstrates in practice that the Chalice framework provided the most straightforward and rapid implementation, the AWS Lambda without Chalice offered greater flexibility and control, and the Flask version allowed local testing and total control but required more manual setup and lacked automatic scalability.
the proceedings contain 63 papers. the topics discussed include: automated GPU grid geometry selection for OpenMP kernels;effect of network topology on the performance of ADMM-based SVMs;exploring the potential of nex...
ISBN:
(纸本)9781538677698
the proceedings contain 63 papers. the topics discussed include: automated GPU grid geometry selection for OpenMP kernels;effect of network topology on the performance of ADMM-based SVMs;exploring the potential of next generation software-defined in-memory frameworks;a fault-tolerant agent-based architecture for transient servers in fog computing;deep learning on large-scale muticore clusters;accelerating deep neural network training for action recognition on a cluster of GPUs;balancing load of GPU subsystems to accelerate image reconstruction in parallel beam tomography;high-performance ensembles of online sequential extreme learning machine for regression and time series forecasting;a machine learning approach for parameter screening in earthquake simulation;adaptive partitioning for iterated sequences of irregular OpenCL kernels;highly scalable stencil-based matrix-free stochastic estimator for the diagonal of the inverse;and adaptive scheduling of collocated applications using a task-based runtime system.
Geophysical exploration methods are important in discovering essential resources like oil and gas. However, traditional exploration often involves environmentally detrimental practices. To address this, software solut...
详细信息
ISBN:
(纸本)9798350381603
Geophysical exploration methods are important in discovering essential resources like oil and gas. However, traditional exploration often involves environmentally detrimental practices. To address this, software solutions simulate seismic imaging techniques for oil detection. In this scenario, the industry is now transitioning these applications to cloud-based Software as a Service (SaaS) models, offering benefits like resource optimization, eco-friendliness, and advanced data analytics. this shift, however, presents challenges in performance, scalability, and cost management. this paper presents a case study on the Fletcher modeling application as a SaaS in geophysical exploration, exploiting cloud hardware heterogeneity. through an extensive set of experiments on the Google Cloud instances with different multicore processors, we show that the Fletcher SaaS model scales with increased hardware resources, with AMD instances offering a better performance-cost trade-off than Intel.
Recent research shows that artificial intelligence (AI) algorithms can dramatically improve the profitability of high-frequency trading (HFT) with accurate market prediction, overcoming the limitation of conventional ...
详细信息
ISBN:
(纸本)9781665476522
Recent research shows that artificial intelligence (AI) algorithms can dramatically improve the profitability of high-frequency trading (HFT) with accurate market prediction, overcoming the limitation of conventional latency-oriented approaches. However, it is challenging to integrate the computationally intensive AI algorithm into the existing trading pipeline due to its excessively long latency and insufficient throughput, necessitating a breakthrough in hardware. Furthermore, harsh HFT environments such as bursty data traffic and stringent power constraint make it even more difficult to achieve systemlevel performance without missing crucial market signals. In this paper, we present LightTrader, the world's first AIenabled HFT system that incorporates an FPGA and custom AI accelerators for short-latency-high-throughput trading systems. Leveraging the computing power of brand-new AI accelerators fabricated in TSMC's 7nm FinFET technology, LightTrader optimizes the tick-to-trade latency and response rate for stock market data. the AI accelerators, adopting Coarse-Grained Reconfigurable Array (CGRA) architecture, which maximizes the hardware utilization from the flexible dataflow architecture, achieve a throughput of 16 TFLOPS and 64 TOPS. In addition, we propose both workload scheduling and dynamic voltage and frequency scaling (DVFS) scheduling algorithms to find an optimal offloading strategy under bursty market data traffic and limited power condition. Finally, we build a reliable and rerunnable simulation framework that can back-test the historical market data, such as Chicago Mercantile Exchange (CME), to evaluate the LightTrader system. We thoroughly explore the performance of LightTrader when the number of AI accelerators, power conditions, and complexity of deep neural network models change. As a result, LightTrader achieves 13.92x and 7.28x speed-up of AI algorithm processing compared to existing GPU-based, FPGA-based systems, respectively. LightTrader wit
暂无评论