One of the most difficult tasks in medicine is predicting cardiac disease. Heart disease is becoming more common at an alarming rate, and being able to predict such diseases in advance is crucial and important. Becaus...
详细信息
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third co...
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third component, whose role is to reduce the number of communication rounds needed to train the model, resisted all attempts at a satisfactory theoretical explanation. Malinovsky et al. (2022) identified four distinct generations of LT methods based on the quality of the provided theoretical communication complexity guarantees. Despite a lot of progress in this area, none of the existing works were able to show that it is theoretically better to employ multiple local gradient-type steps (i.e., to engage in LT) than to rely on a single local gradient-type step only in the important heterogeneous data regime. In a recent breakthrough embodied in their ProxSkip method and its theoretical analysis, Mishchenko et al. (2022) showed that LT indeed leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the 5(th) generation of LT methods. However, while these latest generation LT methods are compatible with DS, none of them support CS. We resolve this open problem in the affirmative. In order to do so, we had to base our algorithmic development on new algorithmic and theoretical foundations.
In the era of Big data, data silos have become a pressing problem due to the difficulty of secure data sharing. Federated learning provides a favorable solution by allowing data holders to collaborate in training a mo...
详细信息
The latest trend of incorporating various data-centric machinelearning (ML) models in software-intensive systems has posed new challenges in the quality assurance practice of software engineering, especially in a hig...
详细信息
ISBN:
(纸本)9781665495967
The latest trend of incorporating various data-centric machinelearning (ML) models in software-intensive systems has posed new challenges in the quality assurance practice of software engineering, especially in a high-risk environment. ML experts are now focusing on explaining ML models to assure the safe behavior of ML-based systems. However, not enough attention has been paid to explain the inherent uncertainty of the training data. The current practice of ML-based system engineering lacks transparency in the systematic fitness assessment process of the training data before engaging in the rigorous ML model training. We propose a method of assessing the collective confidence in the quality of a training dataset by using Dempster Shafer theory and its modified combination rule (Yager's rule). With the example of training datasets for pedestrian detection of autonomous vehicles, we demonstrate how the proposed approach can be used by the stakeholders with diverse expertise to combine their beliefs in the quality arguments and evidences about the data. Our results open up a scope of future research on data requirements engineering that can facilitate evidence-based data assurance for ML-based safety-critical systems.
A major challenge when developing machinelearning (ML) sign language recognition using wearable is how to efficiently translate the gestures based on the acquired sensors data. Conventional method utilizes data fusio...
详细信息
The latest advancements in wireless communication promotes the researchers to concentrate more in the expansion of Mobile Ad hoc Networks (MANETs), where nodes communicate each other to provide the demanded real time ...
详细信息
Various modulated techniques of Content-Based Image Retrieval (CBIR) using deep learning provide better search outputs even though they are computationally challenging. These methods can be enhanced further, if the se...
详细信息
This research studies and compares the use of different machinelearning tools - K-means clustering, Hierarchical clustering, Affinity propagation clustering, and the Random Forest model - with the Low-Correlation Str...
详细信息
Most research on using pseudo-computed tomography (pCT) on brain-imaging techniques relies on in-house methods. As performance as a whole increase, they pay particular attention when using MRI imaging. Methodologies f...
详细信息
The aim of this research work is to classify credit card fraudulent transactions. Nowadays, online transactions have become a necessary part of our lives. Credit card fraud has skyrocketed. In fact, it is one of the m...
详细信息
暂无评论