data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the t...
data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the trained model. However, this approach can be impractical to apply in collaborative machine learning and data marketplace since it is difficult for the parties/buyers to agree on a common validation dataset or determine the exact validation distribution a priori. To address this, we propose a distributionally robust data valuation approach to perform data valuation without known/fixed validation distributions. Our approach defines the value of data by its improvement of the distributionally robust generalization error (DRGE), thus providing a worst-case performance guarantee without a known/fixed validation distribution. However, since computing DRGE directly is infeasible, we propose using model deviation as a proxy for the marginal improvement of DRGE (for kernel regression and neural networks) to compute data values. Furthermore, we identify a notion of uniqueness where low uniqueness characterizes low-value data. We empirically demonstrate that our approach outperforms existing data valuation approaches in data selection and data removal tasks on real-world datasets (e.g., housing price prediction, diabetes hospitalization prediction). Copyright 2024 by the author(s)
The contribution main from this research is modularity and better processing time in detecting community by using K-1 coloring. Testing performed on transaction datasets remittance on P2P platforms where the Louvain C...
详细信息
In the Tech Renaissance, spotting network traffic anomalies has become a game changer for security. With the rapid growth of network traffic and the increasing frequency of cyberattacks, detecting anomalies and intrus...
详细信息
Cloud computing solutions are becoming more and more popular as a way for organizations to improve productivity, save costs, and simplify procedures. The advantage of cloud services is that they enable users to store ...
详细信息
India being an agricultural country, food quality tracking is a major challenge faced by common farmers across the country. This research presents an innovative integration of Convolutional Neural Networks (CNNs) to a...
详细信息
Parameter control refers to the techniques that dynamically adapt the parameter values of the evolutionary algorithm during the optimization process, such as population size, crossover rate, or operator selection. Ada...
详细信息
Given the damping factor α and precision tolerance ϵ, Andersen et al. [2] introduced Approximate Personalized PageRank (APPR), the de facto local method for approximating the PPR vector, with runtime bounded by Θ(1/...
In project management,effective cost estimation is one of the most cru-cial activities to efficiently manage resources by predicting the required cost to fulfill a given ***,finding the best estimation results in softwar...
详细信息
In project management,effective cost estimation is one of the most cru-cial activities to efficiently manage resources by predicting the required cost to fulfill a given ***,finding the best estimation results in software devel-opment is ***,accurate estimation of software development efforts is always a concern for many *** this paper,we proposed a novel soft-ware development effort estimation model based both on constructive cost model II(COCOMO II)and the artificial neural network(ANN).An artificial neural net-work enhances the COCOMO model,and the value of the baseline effort constant A is calibrated to use it in the proposed model *** state-of-the-art publicly available datasets are used for *** backpropagation feed-forward procedure used a training set by iteratively processing and training a neural *** proposed model is tested on the test *** estimated effort is compared with the actual effort *** results show that the effort estimated by the proposed model is very close to the real effort,thus enhanced the reliability and improving the software effort estimation accuracy.
In banking, maintaining customer retention and customer satisfaction are important. Effective customer segmentation can be a strategic tool to improve customer loyalty and business performance. This research can assis...
详细信息
This work studies embedding of arbitrary VC classes in well-behaved VC classes, focusing particularly on extremal classes. Our main result expresses an impossibility: such embeddings necessarily require a significant ...
详细信息
暂无评论