TY - GEN
T1 - GPU Accelerated Matrix Factorization for Recommender Systems
AU - Kilitcioglu, Doruk
AU - Greenquist, Nicholas
AU - Zahran, Mohamed
AU - Bari, Anasse
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3/5
Y1 - 2021/3/5
N2 - Matrix Factorization (MF) is a popular algorithm used to power many recommender systems. Efficient and scalable MF algorithms are essential in order to train on the massive datasets that large scale recommender systems utilize. Graphics Processing Unit (GPU) technology has become very popular in recent years and has become widely used in machine learning. The massive parallelism GPUs offer creates an opportunity to develop an accelerated MF algorithm. This paper presents cu2rec, a matrix factorization algorithm written in CUDA. cu2rec implements a parallel version of Stochastic Gradient Descent (SGD) to solve large scale MF problems. cu2rec utilizes multiple advanced techniques to harness better performance from a GPU. These include aggressive use of constant memory for hyper-parameters and registers for heavily reused values, a sparse matrix data structure, a reduction sum total loss kernel, a novel approach to parallel lock-free updating of feature weights with minimized global memory writes, and fairness across weight updates using user index striding. With a single NVIDIA GPU, cu2rec can be l0x faster than state of the art sequential algorithms while reaching similar error metrics.
AB - Matrix Factorization (MF) is a popular algorithm used to power many recommender systems. Efficient and scalable MF algorithms are essential in order to train on the massive datasets that large scale recommender systems utilize. Graphics Processing Unit (GPU) technology has become very popular in recent years and has become widely used in machine learning. The massive parallelism GPUs offer creates an opportunity to develop an accelerated MF algorithm. This paper presents cu2rec, a matrix factorization algorithm written in CUDA. cu2rec implements a parallel version of Stochastic Gradient Descent (SGD) to solve large scale MF problems. cu2rec utilizes multiple advanced techniques to harness better performance from a GPU. These include aggressive use of constant memory for hyper-parameters and registers for heavily reused values, a sparse matrix data structure, a reduction sum total loss kernel, a novel approach to parallel lock-free updating of feature weights with minimized global memory writes, and fairness across weight updates using user index striding. With a single NVIDIA GPU, cu2rec can be l0x faster than state of the art sequential algorithms while reaching similar error metrics.
KW - CUDA
KW - GPU
KW - SGD
KW - matrix factorization
KW - recommender system
UR - http://www.scopus.com/inward/record.url?scp=85105255147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105255147&partnerID=8YFLogxK
U2 - 10.1109/ICBDA51983.2021.9403110
DO - 10.1109/ICBDA51983.2021.9403110
M3 - Conference contribution
AN - SCOPUS:85105255147
T3 - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
SP - 272
EP - 281
BT - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE International Conference on Big Data Analytics, ICBDA 2021
Y2 - 5 March 2021 through 8 March 2021
ER -