咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Optimizing Apache Spark MLlib:... 收藏

Optimizing Apache Spark MLlib: Predictive Performance of Large-Scale Models for Big Data Analytics

作     者:Theodorakopoulos, Leonidas Karras, Aristeidis Krimpas, George A. 

作者机构:Univ Patras Dept Management Sci & Technol Patras 26334 Greece Univ Patras Comp Engn & Informat Dept Patras 26504 Greece 

出 版 物:《ALGORITHMS》 (Algorithms)

年 卷 期:2025年第18卷第2期

页      面:74-74页

核心收录:

主  题:Apache Spark MLlib big data processing big data analytics performance prediction machine learning resource optimization feature engineering emerging technologies decision-making 

摘      要:In this study, we analyze the performance of the machine learning operators in Apache Spark MLlib for K-Means, Random Forest Regression, and Word2Vec. We used a multi-node Spark cluster along with collected detailed execution metrics computed from the data of diverse datasets and parameter settings. The data were used to train predictive models that had up to 98% accuracy in forecasting performance. By building actionable predictive models, our research provides a unique treatment for key hyperparameter tuning, scalability, and real-time resource allocation challenges. Specifically, the practical value of traditional models in optimizing Apache Spark MLlib workflows was shown, achieving up to 30% resource savings and a 25% reduction in processing time. These models enable system optimization, reduce the amount of computational overheads, and boost the overall performance of big data applications. Ultimately, this work not only closes significant gaps in predictive performance modeling, but also paves the way for real-time analytics over a distributed environment.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分