版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Stanford University Department of Computer Science Stanford USA (GRID:grid.168010.e) (ISNI:***) IBM Research – Africa Nairobi Kenya (GRID:grid.168010.e) Stanford University Stanford Law School Stanford USA (GRID:grid.168010.e) (ISNI:***) ETH Zurich Department of Computer Science Zurich Switzerland (GRID:grid.5801.c) (ISNI:0000 0001 2156 2780) Stanford University Department of Computer Science Stanford USA (GRID:grid.168010.e) (ISNI:***) Stanford University Department of Biomedical Data Science Stanford USA (GRID:grid.168010.e) (ISNI:***)
出 版 物:《NATURE MACHINE INTELLIGENCE》 (Nat. Mach. Intell.)
年 卷 期:2022年第4卷第10期
页 面:904-904页
核心收录:
基 金:National Science Foundation NSF
主 题:Pipelines
摘 要:As artificial intelligence (AI) transitions from research to deployment, creating the appropriate datasets and data pipelines to develop and evaluate AI models is increasingly the biggest challenge. Automated AI model builders that are publicly available can now achieve top performance in many applications. In contrast, the design and sculpting of the data used to develop AI often rely on bespoke manual work, and they critically affect the trustworthiness of the model. This Perspective discusses key considerations for each stage of the data-for-AI pipeline—starting from data design to data sculpting (for example, cleaning, valuation and annotation) and data evaluation—to make AI more reliable. We highlight technical advances that help to make the data-for-AI pipeline more scalable and rigorous. Furthermore, we discuss how recent data regulations and policies can impact *** has become rapidly clear in the past few years that the creation, use and maintenance of high-quality annotated datasets for robust and reliable AI applications requires careful attention. This Perspective discusses challenges, considerations and best practices for various stages in the data-to-AI pipeline, to encourage a more data-centric approach.