datapreprocessing is a crucial step in any machine learning (ML) pipeline, as the quality of the data can greatly impact the accuracy and effectiveness of the final model. With the rise of automated machine learning ...
详细信息
datapreprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing ...
详细信息
ISBN:
(纸本)9781665410144
datapreprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework also provides low-level functions to build custom pipelines. The paper gives a brief overview of the features and the high-level APIs of Preprocessy along with a performance comparison against Scikit-learn and Pandas on two datasets. Preprocessy provides functions for handling missing data and outliers, data normalisation, feature selection and data sampling. The goal of Preprocessy is to be easy to use, flexible and performant. Preprocessy helps beginners and experts alike by making datapreprocessing an easier and faster task.
暂无评论