Abstract
Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.
Original language | English (US) |
---|---|
Pages (from-to) | 49-52 |
Number of pages | 4 |
Journal | IEEE Computer Architecture Letters |
Volume | 24 |
Issue number | 1 |
DOIs | |
State | Published - 2025 |
Keywords
- Memory architecture
- silicon photonics
ASJC Scopus subject areas
- Hardware and Architecture