In geographically distributedsystems, time series data often exhibit non-identical distributions (non-IID) across nodes due to diverse features and sample imbalances, posing significant challenges for centralized AI ...
详细信息
ISBN:
(数字)9798350362930
ISBN:
(纸本)9798350362947
In geographically distributedsystems, time series data often exhibit non-identical distributions (non-IID) across nodes due to diverse features and sample imbalances, posing significant challenges for centralized AI model training. The conventional federated learning paradigm, while mitigating data privacy and communication cost concerns, does not adequately address the imbalance in sample importance nor the presence of anomalous data in model training. This work introduces AIFed, an advanced federated learning framework designed to be both anomaly-resilient and imbalance-aware. AIFed judiciously assigns weights to client models based on data segment features, accounting for sample importance and anomaly levels. By innovatively incorporating local data characteristics into a global training process, AIFed significantly enhances model robustness and effectiveness in real-time learning scenarios. We demonstrate AIFed’s efficacy through extensive experiments that show improved performance over traditional federated learning approaches.
The second-generation Robot Operating System, ROS 2, is a middleware framework for developing robot applications. As robot systems become increasingly complex, the demand for simultaneously handling mixed tasks (criti...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
The second-generation Robot Operating System, ROS 2, is a middleware framework for developing robot applications. As robot systems become increasingly complex, the demand for simultaneously handling mixed tasks (critical and ordinary tasks) has markedly risen; critical tasks are characterized by real-time constraints, while ordinary tasks emphasize fairness. However, the existing researches on ROS 2 executors are limited to employing either a real-time scheduling approach or a fair scheduling approach, failing to accommodate different scheduling strategies for varying task types. And ROS 2 executors cannot isolate critical tasks from the interference of ordinary tasks which compromises the real-time performance of systems. In this paper, we propose a hybrid scheduling executor (HSE) for ROS 2, which mainly incorporates two schedulers—fair and real-time, and two thread types—shared and exclusive. The fair scheduler is responsible for managing ordinary tasks through shared threads, while the real-time scheduler handles critical tasks using both exclusive and shared threads. Additionally, we analyze the scheduling behavior of the HSE and the timing characteristics of task chains to design a chain-aware thread mapping strategy (CATMS). Given a fixed number of CPUs, CATMS aims to map threads to CPUs to optimizes performance while meeting real-time constraints. To the best of our knowledge, this is the first work to address the challenge of integrating critical and ordinary tasks in robotic systems by enhancing the architecture of ROS 2 executors. Through experimental evaluation, we assess the performance of HSE and CATMS. The results show that the HSE achieves real-time performance comparable to the existing executors, while maintaining fairness comparable to the default executor. Furthermore, HSE enhances the real-time performance by including exclusive threads, and CATMS can maximize performance while meeting real-time constraints when operating with a specified number
With the rapid expansion of mobile communication users and the widespread adoption of smartphones and smart devices, Telecommunications Operators have amassed vast volumes of user data, communication data, and network...
详细信息
ISBN:
(数字)9798350364262
ISBN:
(纸本)9798350364279
With the rapid expansion of mobile communication users and the widespread adoption of smartphones and smart devices, Telecommunications Operators have amassed vast volumes of user data, communication data, and network data. This surge in data creates heightened demands for effective data management and analysis, especially in the domain of master data management (MDM). Master data stands out as the pivotal data asset for Telecommunication Operators, forming the bedrock for seamless data interaction across departments and systems. However, challenges persist, including multiple output points, complexities in cross-domain Business, Management, and Operations (BMO) connectivity, and the pressing need for data quality enhancement. In response to these challenges, this paper introduces a Master Data Quality Management Platform tailored for Telecommunication Operators. This comprehensive solution is crafted to optimize and ensure the quality of data within the organization. It brings forth the advantages of dismantling data silos, fostering data sharing, and enhancing the adaptability of IT system construction.
parallel I/O is an essential part of scientific applications running on high-performance computing systems. Understanding an application's parallel I/O behavior and identifying sources of performance bottlenecks r...
详细信息
ISBN:
(纸本)9781665435772
parallel I/O is an essential part of scientific applications running on high-performance computing systems. Understanding an application's parallel I/O behavior and identifying sources of performance bottlenecks require a multi-layer view of the I/O. Typical parallel I/O stack layers offer many tunable parameters that can achieve the best possible I/O performance. However, scientific users do often not have the time nor the experience for investigating the proper combination of these parameters for each application use-case. Auto-tuning can help users by automatically tuning I/O parameters at various layers transparently. In auto-tuning, using naive strategy, running an application by trying all possible combinations of tunable parameters for all layers of the I/O stack to find the best settings is an exhaustive search through the huge parameter space. This strategy is infeasible because of the long execution times of trial runs. In this paper, we propose a genetic algorithm-based parallel I/O auto-tuning approach that can hide the complexity of the I/O stack from users and auto-tune a set of parameter values for an application on a given system to improve the I/O performance. In particular, our approach tests a set of parameters and then, modifies the combination of these parameters for further testing based on the I/O performance. We have validated our model using two I/O benchmarks, namely IOR and MPI-Tile-IO. We achieved an increase in I/O bandwidth of up to 7.74xover the default parameters for IOR and 5.59xover the default parameters for MPI-Tile-IO.
The aerospace industry is one of the largest users of numerical simulation, which is an essential tool in the field of aerodynamic engineering, where many fluid dynamics simulations are involved. In order to obtain th...
详细信息
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
The aerospace industry is one of the largest users of numerical simulation, which is an essential tool in the field of aerodynamic engineering, where many fluid dynamics simulations are involved. In order to obtain the most accurate solutions, some of these simulations use unstructured finite volume solvers that cope with irregular meshes by using explicit time-adaptive integration methods. Modern parallel implementations of these solvers rely on task-based runtime systems to perform fine-grained load balancing and to avoid unnecessary synchronizations. Although such implementations greatly improve performance compared to a classical fork-join MPI+OpenMP variants, it remains a challenge to keep all cores busy throughout the simulation loop. In this article, we first investigate the origins of this lack of parallelism. We emphasize that the irregular structure of the task graph plays a major role in the inefficiency of the computation distribution. Our main contribution is to improve the shape of the task graph by using a new mesh partitioning strategy. The originality of our approach is to take the temporal level of mesh cells into account during the mesh partitioning phase. We evaluate our approach by integrating our solution in an ArianeGroup production code used by Airbus. We show that our partitioning method leads to a more balanced task graph. The resulting task scheduling is up to two times faster for meshes ranging from 200,000 to 12,000,000 components.
The proceedings contain 17 papers. The topics discussed include: automatic traffic light preemption for intelligent transportation systems;towards an elastic lock-free Hash Trie design;a novel server-side aggregation ...
ISBN:
(纸本)9781665432818
The proceedings contain 17 papers. The topics discussed include: automatic traffic light preemption for intelligent transportation systems;towards an elastic lock-free Hash Trie design;a novel server-side aggregation strategy for federated learning in non-IID situations;an asynchronous distributed-memory optimization solver for two-stage stochastic programming problems;translation based self-reconfiguration algorithm for 6-lattice modular robots;curator - a system for creating data sets for behavioral malware detection;parallel and distributed task-based Kirchhoff seismic pre-stack depth migration application;periodicity detection algorithm and applications on IoT data;parallel cloud movement forecasting based on a modified boids flocking algorithm;and efficient real-time earliest deadline first based scheduling for Apache spark.
The basic parallel robotics principle of defining kinematic constraints as vector loops is transferred from the general 3T3R case to the 3T2R case by applying a nonlinear Tait-Bryan-angle rotation constraint using the...
详细信息
Service mesh is a promising micro-services architecture due to its excellent governance capabilities. Unlike traditional service invocation, configurations for governance need to be issued in the service mesh. However...
详细信息
Service mesh is a promising micro-services architecture due to its excellent governance capabilities. Unlike traditional service invocation, configurations for governance need to be issued in the service mesh. However, we find that the control-plane traffic of governance is distributed in full by default, i.e., each service in the data plane receives all configurations. The vast majority of the configurations are redundant for a specific service. Hence, it is important and challenging to make the control plane aware of the calling relationships between services. In this paper, we propose a traffic management mechanism named DATM. Using this mechanism, the entire cluster can be dynamically controlled and services can be configured on demand. It is implemented through a dependency-aware controller and monitors. The controller first processes the information listened to by the monitors and then analyzes the connection between the metrics and the service requests through intelligent algorithms. Finally, the control traffic for regulating the control plane is generated. Our proposed mechanism is experimentally compared with the default strategy and existing work across a wide set of load scenarios in a testbed based on Istio service mesh and Kubernetes. Experimental results demonstrate that our mechanism can save the storage resources of a single agent by 40% to 60%, and the number of cluster updates can be greatly reduced. From the perspective of the whole cluster, the optimization results are even better.
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scienti...
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.
In this article, an upgraded version of CUDA-Quicksort - an iterative implementation of the quicksort algorithm suitable for highly parallel multicore graphics processors, is described and evaluated. Three key changes...
详细信息
In this article, an upgraded version of CUDA-Quicksort - an iterative implementation of the quicksort algorithm suitable for highly parallel multicore graphics processors, is described and evaluated. Three key changes which lead to improved performance are proposed. The main goal was to provide an implementation with increased scalability with the size of data sets and number of cores with modern GPU architectures, which was successfully achieved. The proposed changes also lead to significant reduction in execution time. The execution times were measured on an NVIDIA graphics card, taking into account the possible distributions of the input data.
暂无评论