Cybersecurity plays an important role in protecting people and critical infrastructure. Sectors such as energy, defense and healthcare are increasingly at risk from cyber threats. To address these challenges, dedicate...
详细信息
data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the t...
data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the trained model. However, this approach can be impractical to apply in collaborative machine learning and data marketplace since it is difficult for the parties/buyers to agree on a common validation dataset or determine the exact validation distribution a priori. To address this, we propose a distributionally robust data valuation approach to perform data valuation without known/fixed validation distributions. Our approach defines the value of data by its improvement of the distributionally robust generalization error (DRGE), thus providing a worst-case performance guarantee without a known/fixed validation distribution. However, since computing DRGE directly is infeasible, we propose using model deviation as a proxy for the marginal improvement of DRGE (for kernel regression and neural networks) to compute data values. Furthermore, we identify a notion of uniqueness where low uniqueness characterizes low-value data. We empirically demonstrate that our approach outperforms existing data valuation approaches in data selection and data removal tasks on real-world datasets (e.g., housing price prediction, diabetes hospitalization prediction). Copyright 2024 by the author(s)
Cloud computing solutions are becoming more and more popular as a way for organizations to improve productivity, save costs, and simplify procedures. The advantage of cloud services is that they enable users to store ...
详细信息
Promotional Short Message Service (SMS) messages, which provide frequent updates about deals and discounts to consumers, are crucial in developing countries like Sri Lanka, as they help alleviate financial pressure du...
详细信息
India being an agricultural country, food quality tracking is a major challenge faced by common farmers across the country. This research presents an innovative integration of Convolutional Neural Networks (CNNs) to a...
详细信息
The Internet of Things (IoT), which has had a tremendous development in recent years, has altered many aspects of daily life by giving rise to commercial tools, home automation, wearable technology, and the framework ...
详细信息
While cost, time, and resources are considered to have a high impact on datascience projects, risks are the key for the successful implementation of a project. The correct handling of risks has been proven to increas...
详细信息
From a broader perspective, the objective of visual speech recognition (VSR) is to comprehend the speech spoken by an individual using visual deformations. However, some of the significant limitations of existing solu...
详细信息
In today’s corporate landscape, the creation of questionnaires, surveys or evaluation forms for employees is a widespread practice. These tools are regularly used to check various aspects such as motivation, opportun...
详细信息
In this article, we share the details of Dickinson College’s journey to establish a data analytics major. Designed to provide students with the technical proficiency required to become a data scientist, Dickinson’s ...
暂无评论