Anomalies are unexpected instances which significantly deviate from the normal patterns formed by the majority of a dataset. The more an observation deviate from the normal pattern, the more likely it is an anomaly. T...
详细信息
ISBN:
(数字)9781728142104
ISBN:
(纸本)9781728142111
Anomalies are unexpected instances which significantly deviate from the normal patterns formed by the majority of a dataset. The more an observation deviate from the normal pattern, the more likely it is an anomaly. The continuous increase in the number of car models and configuration possibilities has led to continuous increase in the complexity of logistics supply chain and production. Consequently, it has become difficult to manage the whole IT Landscape, a small anomaly/failure somewhere in the system could lead to a huge loss of money. Therefore, to identify and ultimately resolve quickly a problem in such a system is highly important. This paper addresses the challenge of identifying anomalies in a scalable way. The new data collected suffers from the problem of lack of labels for training. This challenge is addressed in the developed solution by using multiple unsupervised algorithms and reporting those observation as anomalies which are commonly reported as anomalies by all the algorithms. The developed solution also tackles the problem of data heterogeneity and big size by using Spark underneath for scalable data processing. Scalability test results demonstrate the reduction in training time of 100 transactions by 80% when using 10 cores instead of using 1 core. The results of the study have also pointed out that increasing the number of cores does not necessarily means reduction in the overall execution time, there are other factors like communications between the cores, non-spark related processing tasks, etc which can also influence the execution time.
Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on design...
详细信息
ISBN:
(数字)9798350395662
ISBN:
(纸本)9798350395679
Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, current serverless FL systems still suffer from the presence of stragglers, i.e., slow clients that impede the collaborative training process. While strategies aimed at mitigating stragglers in these systems have been proposed, they overlook the diverse hardware resource configurations among FL clients. To this end, we present Apodotiko, a novel asynchronous training strategy designed for serverless FL. Our strategy incorporates a scoring mechanism that evaluates each client’s hardware capacity and dataset size to intelligently prioritize and select clients for each training round, thereby minimizing the effects of stragglers on system performance. We comprehensively evaluate Apodotiko across diverse datasets, considering a mix of CPU and GPU clients, and compare its performance against five other FL training strategies. Results from our experiments demonstrate that Apodotiko outperforms other FL training strategies, achieving an average speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy significantly reduces cold starts by a factor of four on average, demonstrating suitability in serverless environments.
Companies traditionally dependent on on-premise HPC clusters for simulations are increasingly migrating workloads to the cloud. Cloud computing offers greater flexibility in selecting processors, memory, network bandw...
详细信息
ISBN:
(数字)9798331528690
ISBN:
(纸本)9798331528706
Companies traditionally dependent on on-premise HPC clusters for simulations are increasingly migrating workloads to the cloud. Cloud computing offers greater flexibility in selecting processors, memory, network bandwidth, along with enhanced resource availability and scalability. Automotive companies rely on computationally intensive numerical simulation tools for CAE (computer-Aided Engineering), particularly with the growing demand for generative design, which utilizes algorithms to automatically explore a large solution space. This work addresses the gap between the growing runtime demands of these simulations and the limitations of static HPC infrastructure by representing iterative workflows as Directed Acyclic Graphs (DAGs) and optimizing their scheduling. We propose a unified hybrid infrastructure that leverages the elasticity of cloud resources along with existing HPC clusters to maximize computational efficiency, ensure timely completion of simulations, and optimize resource utilization and costs.
Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems fo...
详细信息
ISBN:
(纸本)9781665480468
Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with severless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless F L. FedLesScan dynamically adapts to the behavior of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2 nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.
The advent of generative design in the automotive sector, characterised by the automatic and iterative exploration of expansive solution spaces to discover optimal design configurations, has significantly increased th...
详细信息
ISBN:
(数字)9798350365610
ISBN:
(纸本)9798350365627
The advent of generative design in the automotive sector, characterised by the automatic and iterative exploration of expansive solution spaces to discover optimal design configurations, has significantly increased the demand for computational resources to run intensive computer-aided engineering (CAE) simulations within constrained time frames. The inherent limitations of static high-performance computing (HPC) clusters have necessitated the adoption of cloud resources due to their flexible and elastic nature, thereby enhancing the capacity to accommodate the computational demands of these iterative workflows. These workflows, represented as Directed Acyclic Graphs (DAGs), involve the serial and parallel execution of tasks, which can dynamically share resources with other workflows during idle periods. In this paper, we propose an economy-based approach to exploit the gaps generated by these idle periods through a bidding system, thereby enabling more efficient resource utilisation and reducing the average wait time, makespan, cost and deadline miss by more than 40%, 6%, 13% and 45%respectively against certain infrastructures and baselines. Furthermore, we explore the potential for generating revenue by renting out idle resources in a hybrid cloud setup. This approach not only aims to optimise the use of computational resources but also seeks to provide cost-effective solutions to meet the escalating demands of generative design in the automotive sector.
Computational fluid dynamics (CFD) can serve as a complementary approach to conventional wind tunnel testing to assess the wind flow around tall buildings. Being a clear High Performance Computing (HPC) task, CFD simu...
详细信息
ISBN:
(数字)9781728165820
ISBN:
(纸本)9781728165837
Computational fluid dynamics (CFD) can serve as a complementary approach to conventional wind tunnel testing to assess the wind flow around tall buildings. Being a clear High Performance Computing (HPC) task, CFD simulations conventionally run on supercomputers and compute clusters using specialized software such as OpenFOAM. The limited availability and high maintenance costs of supercomputers and clusters force small and medium companies to search for the cost-efficient infrastructure to conduct their simulations with the appropriate performance. The on-demand offer of compute capacity by cloud service providers are well suited this task. However, engineers and researchers require extensive expertise and experience in working with cloud computing in order to benefit from running CFD simulations on a *** contribution of the paper to the outlined problem is two-fold: 1) a unique Automated parallel Processing Application (APPA) tool that hides the cloud management details from the wind engineer and provides an intuitive user interface; 2) the estimation of the optimal number of cores (vCPUs) for virtual machine instances provided by AWS and Google Cloud based on average run time and total cost metrics for a given number of cells of a CFD-simulation. n1-highcpu-96 Google Cloud VM met both goals: low cost and low runtime per timestep. For the number of vCPUs below 16, the c4.8xlarge AWS VM type has the least runtime per timestep in all the cases. Google Cloud instances with high vCPUs are recommended to run the simulations if budget is a big concern.
A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, to identify and ultimately resolve it quickly is highly important. However, in the produ...
详细信息
In neutral atom quantum computers, readout and preparation of the atomic qubits are usually based on fluorescence imaging and subsequent analysis of the acquired image. For each atom site, the brightness or some compa...
详细信息
ISBN:
(数字)9798331541378
ISBN:
(纸本)9798331541385
In neutral atom quantum computers, readout and preparation of the atomic qubits are usually based on fluorescence imaging and subsequent analysis of the acquired image. For each atom site, the brightness or some comparable metric is estimated and used to predict the presence or absence of an atom. Across different setups, we can see a vast number of different approaches used to analyze these images. Often, the choice of detection algorithm is either not mentioned at all or it is not justified. We investigate several different algorithms and compare their performance in terms of both precision and execution run time. To do so, we rely on a set of synthetic images across different simulated exposure times with known occupancy states, which we generated using a previously validated imaging simulation. Since the use of simulation provides us with the ground truth of atom site occupancy, we can easily state precise error rates and variances of the reconstructed property. However, knowing the relative performance of these algorithms is not sufficient to justify their use, since better ones can exist that were not compared. To investigate this possibility, we calculated the Cramer-Rao bound in order to establish an upper limit that even a perfect estimator cannot outperform. As the metric of choice, we used the number of photonelectrons that can be contributed to a specific atom site. Every estimator that reconstructs a different property can simply be scaled accordingly. Since the bound depends on the occupancy of neighboring sites, we provide the best and worst cases, as well as a half filled one, which should represent an averaged bound best. Our comparison shows that of our tested algorithms, a global nonlinear least-squares solver that uses the optical system's point spread function (PSF) to return a global bias and each sites' number of photoelectrons performed the best, on average crossing the worst-case bound for longer exposure times. Its main drawback is its huge
In neutral atom quantum computers, readout and preparation of the atomic qubits are usually based on fluorescence imaging and subsequent analysis of the acquired image. For each atom site, the brightness or some compa...
详细信息
In neutral atom quantum computers, readout and preparation of the atomic qubits are usually based on fluorescence imaging and subsequent analysis of the acquired image. For each atom site, the brightness or some comparable metric is estimated and used to predict the presence or absence of an atom. Across different setups, we can see a vast number of different approaches used to analyze these images. Often, the choice of detection algorithm is either not mentioned at all or it is not justified. We investigate several different algorithms and compare their performance in terms of both precision and execution run time. To do so, we rely on a set of synthetic images across different simulated exposure times with known occupancy states, which we generated using a previously validated imaging simulation. Since the use of simulation provides us with the ground truth of atom site occupancy, we can easily state precise error rates and variances of the reconstructed property. However, knowing the relative performance of these algorithms is not sufficient to justify their use, since better ones can exist that were not compared. To investigate this possibility, we calculated the Cramér-Rao bound in order to establish an upper limit that even a perfect estimator cannot outperform. As the metric of choice, we used the number of photonelectrons that can be contributed to a specific atom site. Every estimator that reconstructs a different property can simply be scaled accordingly. Since the bound depends on the occupancy of neighboring sites, we provide the best and worst cases, as well as a half filled one, which should represent an averaged bound best. Our comparison shows that of our tested algorithms, a global non-linear least-squares solver that uses the optical system’s point spread function (PSF) to return a global bias and each sites’ number of photoelectrons performed the best, on average crossing the worst-case bound for longer exposure times. Its main drawback is its hug
Neutral atom quantum computers require accurate single atom detection for the preparation and readout of their qubits. This is usually done using fluorescence imaging. The occupancy of an atom site in these images is ...
Neutral atom quantum computers require accurate single atom detection for the preparation and readout of their qubits. This is usually done using fluorescence imaging. The occupancy of an atom site in these images is often somewhat ambiguous due to the stochastic nature of the imaging process. Further, the lack of ground truth makes it difficult to rate the accuracy of reconstruction algorithms. We introduce a bottom-up simulator that is capable of generating sample images of neutral atom experiments from a description of the actual state in the simulated system. Possible use cases include the creation of exemplary images for demonstration purposes, fast training iterations for deconvolution algorithms, and generation of labeled data for machine-learning-based atom detection approaches. The implementation is available through our GitHub as a C library or wrapped Python package. We show the modeled effects and implementation of the simulations at different stages of the imaging process. Not all real-world phenomena can be reproduced perfectly. The main discrepancies are that the simulator allows for only one characterization of optical aberrations across the whole image, supports only discrete atom locations, and does not model all effects of complementary metal-oxide-semiconductor (CMOS) cameras perfectly. Nevertheless, our experiments show that the generated images closely match real-world pictures to the point that they are practically indistinguishable and can be used as labeled data for training the next generation of detection algorithms.
暂无评论