The autoscaling function dynamically adjusts the resource configuration of microservice applications in response to workload changes, thereby ensuring service quality. However, designing an effective scaling strategy ...
详细信息
ISBN:
(纸本)9798350377149;9798350377132
The autoscaling function dynamically adjusts the resource configuration of microservice applications in response to workload changes, thereby ensuring service quality. However, designing an effective scaling strategy for each service remains challenging due to the heterogeneity of services. To address this challenge, we introduce DCScaler, a distributed collaborative scaler that leverages spatiotemporal predictions to optimize Service Level Agreement (SLA) guarantees and resource utilization by proactive resource allocation adjustments. DCScaler adopts (i) a spatiotemporal graph attention network to learn the spatiotemporal dependencies among service metrics;(ii) a multi-agent Deep Reinforcement Learning (DRL) based scaler to learn the optimal scaling strategy tailored to each service. DC-Scaler accurately predicts future workloads and adjusts resource allocation accordingly for each service. Experimental results obtained in a real microservice environment demonstrate that DCScaler effectively enhances resource utilization and reduces SLA violations.
Challenges like network latency, bandwidth limitations, and varied node resources are encountered by distributed databases in edge computing environments. This paper examines a distributed database synchronization mec...
详细信息
The growing volume and complexity of unstructured and semi-structured data pose a significant challenge in extracting meaningful and relevant information. Information Extraction (IE) emerges as a powerful technique to...
详细信息
The increasing amount of available smart objects produces a huge amount of data that can be successfully managed using Federated Learning (FL) approaches. FL is a distributed deep learning framework where several devi...
详细信息
ISBN:
(纸本)9798350319552;9798350319545
The increasing amount of available smart objects produces a huge amount of data that can be successfully managed using Federated Learning (FL) approaches. FL is a distributed deep learning framework where several devices are trained with a local model on local data, and a central server aggregates them in a global model. These strategies are successfully used in several contexts ensuring privacy preservation and high effectiveness in data analysis. However, the FL strategies show variable accuracy in case of a huge presence of non-independent-and-identically-distributed (Non-IID) data. This challenge has been largely explored in the last years by researchers and developers who propose new FL strategies for non-IID data. This study introduces a novel FL approach, called DQFed, that aggregates the local models on the base of their weights computed according to a quality-driven model. The surrounding idea is that the performance of the general model can be improved by aggregating the clients' models giving higher importance to the clients that use high-quality data. The DQFed strategy is evaluated on five datasets (both Non-IID and IID datasets are considered) obtained starting from a real dataset. The results show an improved F1-score compared to the considered baseline for both Non-IID and IID data.
While numerous Content-Defined Chunking (CDC) algorithms exist for data deduplication, their relative performance has not been analyzed in the presence of low-entropy induced byte-shifting. This paper explores and eva...
详细信息
ISBN:
(纸本)9798350368543;9798350368536
While numerous Content-Defined Chunking (CDC) algorithms exist for data deduplication, their relative performance has not been analyzed in the presence of low-entropy induced byte-shifting. This paper explores and evaluates hash-based and hashless CDC algorithms in the presence of low-entropy data regions, using synthetic datasets. Our evaluation shows that modern CDC algorithms are poor at handling low-entropy blocks when the block sizes are small and that their low-entropy detection ability depends upon the expected average chunk size. Contrary to previous studies focusing on conventional byte-shifting, hash-based algorithms achieve poor space savings compared to their hashless counterparts when low-entropy induced byte-shifting is involved. This can be explained by the greater variability in chunk sizes and the higher percentage of artificial boundaries they exhibit in the presence of these regions. All of these factors together highlight the need for specialized CDC algorithms to detect and eliminate low-entropy data blocks during the deduplication process.
Gaussian distributed random signal has broad prospects in various applications. Random signal generators make it easy to generate Gaussian distributed random signals, greatly promoting the widespread application of Ga...
详细信息
Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related to their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessi...
详细信息
ISBN:
(纸本)9798350368543;9798350368536
Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related to their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the data processing demands in many applications related to social media, bioinformatics, surveillance systems, remote sensing, and medical informatics. Given this new scenario, the architecture of data analytics engines must evolve to take advantage of these new technological trends. In this paper, we present ArcaDB: a disaggregated query engine that leverages container technology to place operators at compute nodes that fit their performance profile. In ArcaDB, a query plan is dispatched to worker nodes that have different computing characteristics. Each operator is annotated with the preferred type of compute node for execution, and ArcaDB ensures that the operator gets picked up by the appropriate workers. We have implemented a prototype version of ArcaDB using Java, Python, and Docker containers. We have also completed a preliminary performance study of this prototype, using images and scientific data. This study shows that ArcaDB can speed up query performance by a factor of 3.5x in comparison with a shared-nothing, symmetric arrangement.
The advancing industrial digitalization enables evolved process control schemes that rely on accurate models learned through data-driven approaches. While they provide high control performance and are robust to smalle...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
The advancing industrial digitalization enables evolved process control schemes that rely on accurate models learned through data-driven approaches. While they provide high control performance and are robust to smaller deviations, a larger change in process behavior can pose significant challenges, in the worst case even leading to a damaged process plant. Hence, it is important to frequently assess the fit between the model and the actual process behavior. As the number of controlled processes and associated data volumes increase, the need for lightweight and fast reacting assessment solutions also increases. In this paper, we propose CIVIC, an in-network computing-based solution for Continuous In-situ Validation of Industrial Control models. In short, CIVIC monitors relevant process variables and detects different process states through comparison with a priori knowledge about the desired process behavior. This detection can then be leveraged to, e.g., shut down the process or trigger a reconfiguration. We prototype CIVIC on an Intel Tofino-based switch and apply it to a lab-scale water treatment plant. Our results show that we can achieve a high detection accuracy, proving that such monitoring systems are feasible and sensible.
This paper proposes one approach with federated learning technique to address practical challenges faced by the emerging green energy industries, i.e., wind turbines in terms of Predictive Health Management (PHM). Not...
详细信息
ISBN:
(纸本)9798350389463;9798350389470
This paper proposes one approach with federated learning technique to address practical challenges faced by the emerging green energy industries, i.e., wind turbines in terms of Predictive Health Management (PHM). Not as many federated learning applications being used in the scenarios only for simulation, the application of federated learning in this paper is focused on the real industrial problems with raw data collected from the fields. Huge amount of real data was collected by sensors on more than ten wind turbines across different areas in China and transmitted to the storage for in-time processing. The framework proposed in this paper called TurboFed, can handle the raw data and achieves good prediction performance in the practical wind generated power systems. The framework showed its help on improving the efficiency of the wind turbines. The paper has brought three novel results. First, as far as known, the framework here is the first federated learning framework addressing position adjustment of wind turbines in the real environment. Second, it deploys customized recurrent neural computing models to the wind turbines which are considered the client devices under the federated learning paradigm. Finally, it incorporates new customized aggregation algorithms on the sever side.
The construction of a distributed heterogeneous data platform for power grid dispatching faces challenges of diversity, large scale, and high performance. However, existing data platform design methods in both the pow...
详细信息
ISBN:
(纸本)9798350375145;9798350375138
The construction of a distributed heterogeneous data platform for power grid dispatching faces challenges of diversity, large scale, and high performance. However, existing data platform design methods in both the power and computer science fields struggle to meet practical production requirements effectively. This paper constructs a distributed data storage architecture model for power grid dispatching, defining the elements and their relationships within the architecture. Additionally, it proposes methods for managing massive source data and distributed heterogeneous database clusters. Based on these findings, a power grid dispatching business data platform is designed. Test results indicate that the proposed architecture effectively supports the efficient execution of power grid dispatching business, providing a specialized data platform design paradigm for the power industry.
暂无评论