Federated learning (FL), a distributed learning strategy, improves security and privacy by eliminating the need for clients to share their local data; however, FL struggles with non-IID (non independent and identicall...
详细信息
ISBN:
(数字)9798350374889
ISBN:
(纸本)9798350374896
Federated learning (FL), a distributed learning strategy, improves security and privacy by eliminating the need for clients to share their local data; however, FL struggles with non-IID (non independent and identically distributed) data. Clustered FL aims to remedy this by grouping similar clients and training a model per group; nevertheless, it faces difficulties in determining clusters without sharing local data and conducting model evaluation. Clustered FL evaluation on unseen clients typically applies all models, selecting the best-performer for each client - approach known as best-fit cluster evaluation. This paper challenges such evaluation process arguing that it violates a fundamental machine learning principle: test dataset labels should be used only for performance calculation, not for model selection. We show that best-fit cluster evaluation results in significant accuracy overestimates. Moreover, we present an evaluation approach that maintains the separation between model selection and evaluation by reserving a portion of the target client data for model selection, while the remaining data is used for accuracy estimation. Experiments on four datasets, encompassing various IID and non-IID scenarios, demonstrate that the best-fit cluster evaluation produces overestimates that are statistically different from our evaluation.
The paper analyzed the characteristics of digital library resources and the requirements for *** gave the technology and characteristics of grid storage ,analyzed the advantages of the grid storage, and discussed the ...
详细信息
The paper analyzed the characteristics of digital library resources and the requirements for *** gave the technology and characteristics of grid storage ,analyzed the advantages of the grid storage, and discussed the application of grid storage technology in the digital library resources stored from three aspects.
Utilizing information from stratigraphic columns, encompassing soil type, composition, and physical properties such as strength, density, permeability, and water-carrying capacity, we constructed a dataframe. By index...
详细信息
ISBN:
(数字)9798350375718
ISBN:
(纸本)9798350375725
Utilizing information from stratigraphic columns, encompassing soil type, composition, and physical properties such as strength, density, permeability, and water-carrying capacity, we constructed a dataframe. By indexing the dataframe along the distance and depth axes, we derived a non-uniform grid displaying soil layer thickness and soil code. Subsequent analysis involved the development of deterministic and probabilistic models to understand spatial variability. The application of deterministic models is found to diminish the reliability of geological calculations. In contrast, the probabilistic model, backed by diverse correlations, provides a more comprehensive understanding of the relationship between soil type and soil layer thickness along the distance and depth. To quantify the similarity between probability distributions of soil characteristics we built a map of Hellinger distances between soil layer thickness and soil code.
Food safety events occur frequently due to kinds of reasons. Thus to build an aquatic product traceability system is imperative. Many countries have already established food traceability system. However, existing solu...
详细信息
Food safety events occur frequently due to kinds of reasons. Thus to build an aquatic product traceability system is imperative. Many countries have already established food traceability system. However, existing solutions can not be applied directly in the aquatic product traceability system because of a set of requirements. According to the particularity of aquatic breeding, this paper put forwards an aquatic product traceability system integrated RFID technology. The system covers with the entire life-circle of seafood. To solve the data sharing, this paper presents a model using distributed database. On this basis, this paper analyzes the SFP (separating fishes from one pool to another one) process, addresses a SFP model, and proposes one tracing algorithm.
Elevator maintenance plays an important role in the safety and economic aspects of daily life, and the location of elevator maintenance sites is crucial in this scenario. A reasonable location can help reduce economic...
详细信息
Elevator maintenance plays an important role in the safety and economic aspects of daily life, and the location of elevator maintenance sites is crucial in this scenario. A reasonable location can help reduce economic expenses and ensure the healthy operation of the system of maintenance sites. However, due to the lack of candidate location sets and unevenly distributed data layers, existing coverage siting methods have difficulties and challenges in solving this problem, and ignore the fairness of load among different sites. To address above problems, We propose a heuristic coverage siting method called MRM, which is based on multi-round MeanShift clustering with coverage radius variation. Experiments on a real elevator dataset with noise show that MRM not only achieves 99.21 % coverage with fewer sites but also achieves fairly good global fairness.
Is process migration useful for load balancing? We present experimental results indicating that the answer to this question depends largely on the characteristics of the applied workload. Experiments with our Shiva sy...
详细信息
Is process migration useful for load balancing? We present experimental results indicating that the answer to this question depends largely on the characteristics of the applied workload. Experiments with our Shiva system, which supports remote execution and process migration, show that only those CPU bound workloads which were generated using an unrealistic exponential distribution for execution times show improvements for dynamic load balancing. (We use the term 'dynamic' to indicate remote execution determined at and not prior to run time. The latter is known as 'static' load balancing.) Using a more realistic workload distribution and adding a number of short lived tasks prevents dynamic algorithms from working. Migration is only useful with heterogeneous workloads. We find the migration of executing tasks to remote data to be effective for balancing I/O bound workloads, and indicate the region of 'workload variable space' for which this migrate-to-data approach is useful.
In this paper, we propose a blind synchronization method for signals with sampling rate offset (SRO) and missing data, which occasionally occurs in distributed recording for acoustic scene classification. In our metho...
详细信息
ISBN:
(数字)9798350367331
ISBN:
(纸本)9798350367348
In this paper, we propose a blind synchronization method for signals with sampling rate offset (SRO) and missing data, which occasionally occurs in distributed recording for acoustic scene classification. In our method, the correspondence between short-time frames is first estimated using cross-correlation and dynamic programming (DP) matching. Then, two methods for producing synchronized signals are compared. The first method is based on the overlap-add along the DP path, while the second method uses the DP path only to identify missing data positions and compensates for the SRO with a linear phase model. The performance of these methods is evaluated through experiments. The results are promising, and further applications to acoustic scene classification are expected.
Nowadays the need for fast and reliable communication is increasing, which leads us to look for new ways to enhance channel coding. In this paper, we will study the case of a distributed coding between two users that ...
详细信息
Nowadays the need for fast and reliable communication is increasing, which leads us to look for new ways to enhance channel coding. In this paper, we will study the case of a distributed coding between two users that aim to transmit data to a common destination, where each user transmits a partial redundancy to the destination, and relies on the second user for the remaining. The purpose of distributing the redundancy creation and transmission, is to benefit from each user channel quality for a more accurate decoding. In the context of our analysis, we will use a 1/2 rate convolutional code with between users and a distributed Turbo code for transmission to the destination. However, this study will aim to highlight the different key factors, as well as the advantage of choosing a distributed encoding.
Bounds are developed on the probability that the Cartesian product of a given number of finite random sets does not intersect (avoids) a given fixed set. These bounds are then used to estimate the probability of data ...
详细信息
Bounds are developed on the probability that the Cartesian product of a given number of finite random sets does not intersect (avoids) a given fixed set. These bounds are then used to estimate the probability of data loss in a distributed storage system that uses erasure codes to protect against data loss when disks fail. These are the first bounds on the probability of data loss that we are aware of. We compare our upper bound on the probability of data loss to approximations that are used in the literature, and show that our bounds are tighter and the gap is significant in some cases. Our bounds also suggest that in some cases, a more efficient (higher rate) code will suffice to meet a data loss probability target than that predicted by approximations widely used in the industry.
暂无评论