TY - GEN
T1 - Sirius
T2 - 20th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2015
AU - Hauswald, Johann
AU - Laurenzano, Michael A.
AU - Zhang, Yunqi
AU - Li, Cheng
AU - Rovinski, Austin
AU - Khurana, Arjun
AU - Dreslinski, Ronald G.
AU - Mudge, Trevor
AU - Petrucci, Vinicius
AU - Tang, Lingjia
AU - Mars, Jason
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/3/14
Y1 - 2015/3/14
N2 - As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FP-GAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10 × and 16 ×. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6 × and 1.4 ×, respectively.
AB - As user demand scales for intelligent personal assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FP-GAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10 × and 16 ×. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6 × and 1.4 ×, respectively.
KW - Datacenters
KW - Emerging workloads
KW - Intelligent personal assistants
KW - Warehouse scale computers
UR - http://www.scopus.com/inward/record.url?scp=84939202658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939202658&partnerID=8YFLogxK
U2 - 10.1145/2694344.2694347
DO - 10.1145/2694344.2694347
M3 - Conference contribution
AN - SCOPUS:84939202658
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 223
EP - 238
BT - ASPLOS 2015 - 20th International Conference on Architectural Support for Programming Languages and Operating Systems
PB - Association for Computing Machinery
Y2 - 14 March 2015 through 18 March 2015
ER -