版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Texas Austin Dept Comp Sci Austin TX 78712 USA
出 版 物:《KNOWLEDGE AND INFORMATION SYSTEMS》 (知识和信息系统季刊)
年 卷 期:2014年第41卷第3期
页 面:793-819页
核心收录:
学科分类:0711[理学-系统科学] 07[理学] 08[工学] 070105[理学-运筹学与控制论] 081101[工学-控制理论与控制工程] 0701[理学-数学] 071101[理学-系统理论] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:NSF [CCF-0916309, CCF-1117055] DOD Army Grant [W911NF-10-1-0529]
主 题:Recommender systems Missing value estimation Matrix factorization Low rank approximation Parallelization Distributed computing
摘 要:Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating least squares (ALS) and stochastic gradient descent (SGD) are two popular approaches to compute matrix factorization, and there has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been thoroughly explored. In this paper, we show that coordinate descent-based methods have a more efficient update rule compared to ALS and have faster and more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updates rank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, with a synthetic dataset containing 14.6 billion ratings, on a distributed memory cluster with 64 processors, to deliver the desired test RMSE, CCD++ is 49 times faster than SGD and 20 times faster than ALS. When the number of processors is increased to 256, CCD++ takes only 16 s and is still 40 times faster than SGD and 20 times faster than ALS.