International Journal of Networked and Distributed Computing

Volume 4, Issue 2, April 2016, Pages 96 - 105

Accelerated Diffusion-Based Recommendation Algorithm on Tripartite Graphs with GPU Clusters

Authors
Jingpeng Wang, Jie Huang, Mi Li
Corresponding Author
Jie Huang
Available Online 1 April 2016.
DOI
10.2991/ijndc.2016.4.2.3How to use a DOI?
Keywords
Diffusion-Based Recommendation Algorithms on Tripartite Graphs; CUDA; Stream Scheduling; Shared Memory; GPU Clusters.
Abstract

Exorbitant computation cost hinders the practical application of recommendation algorithm, especially in time-critical application scenario. Although experiments show that recommendation algorithm based on an integrated diffusion on user-item-tag tripartite graphs can significantly improve accuracy, diversification and novelty of recommendation, it is also very time-consuming. Therefore, a parallel solution is frequently needed to improve the execution speed of the algorithm. This paper explicitly presents the parallel implementation and optimizations of diffusion-based recommendation on weighted tripartite graphs algorithm using Compute Unified Device Architecture and related optimization solutions including accelerated memory access with shared memory, stream scheduling and GPU clusters optimization. Compared to the algorithm running on a single CPU core, the unoptimized GPU kernel can achieve 153.9 speedup on average with the input dataset consists of 30000 records on GTX 980. With shared memory applied, the time cost on device memory access saves about 50% on dataset of 90000 records and with 2-way streams scheduling, the kernel’s performance improves about 7% ~ 13%. Based on the optimized GPU kernel, we evaluate the performance of the recommendation algorithm with customized socket communication mechanism on GPU clusters. And compared to a single GPU node, we achieve 7.55 speedup on clusters of 9 GPUs when recommending for 8000 users. Besides this, the speedup of GPU clusters is also 26.1 times of the speedup of our CPU clusters of 9 nodes and 1586.28 times of serial algorithm on one CPU core. It proves that GPU technology can dramatically improve the algorithm’s performance.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Journal
International Journal of Networked and Distributed Computing
Volume-Issue
4 - 2
Pages
96 - 105
Publication Date
2016/04/01
ISSN (Online)
2211-7946
ISSN (Print)
2211-7938
DOI
10.2991/ijndc.2016.4.2.3How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Jingpeng Wang
AU  - Jie Huang
AU  - Mi Li
PY  - 2016
DA  - 2016/04/01
TI  - Accelerated Diffusion-Based Recommendation Algorithm on Tripartite Graphs with GPU Clusters
JO  - International Journal of Networked and Distributed Computing
SP  - 96
EP  - 105
VL  - 4
IS  - 2
SN  - 2211-7946
UR  - https://doi.org/10.2991/ijndc.2016.4.2.3
DO  - 10.2991/ijndc.2016.4.2.3
ID  - Wang2016
ER  -