DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu Yeon Wei, Hsien Hsin S. Lee, David Brooks, Carole Jean Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2 × across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages982-995
Number of pages14
ISBN (Electronic)9781728146614
DOIs
StatePublished - May 2020
Event47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020 - Virtual, Online, Spain
Duration: May 30 2020Jun 3 2020

Publication series

NameProceedings - International Symposium on Computer Architecture
Volume2020-May
ISSN (Print)1063-6897

Conference

Conference47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020
CountrySpain
CityVirtual, Online
Period5/30/206/3/20

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference'. Together they form a unique fingerprint.

Cite this