TY - GEN
T1 - Coala
T2 - 2025 Design, Automation and Test in Europe Conference, DATE 2025
AU - Gamil, Homer
AU - Mazonka, Oleg
AU - Maniatakos, Michail
N1 - Publisher Copyright:
© 2025 EDAA.
PY - 2025
Y1 - 2025
N2 - In this study, we introduce Coala, a novel framework designed to enhance the performance of finite field transformations for GPU environments. We have developed a GPU-optimized version of the Discrete Galois Transformation (DGT), a variant of the Number Theoretic Transform (NTT). We introduce a novel data access pattern scheme specifically engineered to enable coalesced accesses, significantly enhancing the efficiency of data transfers between global and shared memory. This enhancement not only boosts execution efficiency but also optimizes the interaction with the GPU's memory architecture. Additionally, Coala presents a comprehensive framework that optimizes the allocation of computational tasks across the GPU's architecture and execution kernels, thereby maximizing the use of GPU resources. Lastly, we provide a flexible method to adjust security levels and polynomial sizes through the incorporation of an in-kernel RNS method, and a flexible parameter generation approach. Comparative analysis against current state-of-the-art techniques reveals significant improvements. We observe performance gains of 2.82′ - 17.18′ against other DGT works on GPUs for different parameters, achieved concurrently with equal or lesser memory utilization.
AB - In this study, we introduce Coala, a novel framework designed to enhance the performance of finite field transformations for GPU environments. We have developed a GPU-optimized version of the Discrete Galois Transformation (DGT), a variant of the Number Theoretic Transform (NTT). We introduce a novel data access pattern scheme specifically engineered to enable coalesced accesses, significantly enhancing the efficiency of data transfers between global and shared memory. This enhancement not only boosts execution efficiency but also optimizes the interaction with the GPU's memory architecture. Additionally, Coala presents a comprehensive framework that optimizes the allocation of computational tasks across the GPU's architecture and execution kernels, thereby maximizing the use of GPU resources. Lastly, we provide a flexible method to adjust security levels and polynomial sizes through the incorporation of an in-kernel RNS method, and a flexible parameter generation approach. Comparative analysis against current state-of-the-art techniques reveals significant improvements. We observe performance gains of 2.82′ - 17.18′ against other DGT works on GPUs for different parameters, achieved concurrently with equal or lesser memory utilization.
KW - DGT
KW - FHE
KW - Finite Field Transformations
KW - GPU
KW - Hardware Acceleration
KW - NTT
KW - Polynomial Multiplication
UR - http://www.scopus.com/inward/record.url?scp=105006927880&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105006927880&partnerID=8YFLogxK
U2 - 10.23919/DATE64628.2025.10992695
DO - 10.23919/DATE64628.2025.10992695
M3 - Conference contribution
AN - SCOPUS:105006927880
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 31 March 2025 through 2 April 2025
ER -