TY - GEN
T1 - DiTAS
T2 - 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
AU - Dong, Zhenyuan
AU - Zhang, Sai Qian
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Diffusion Transformers (DiTs) have recently attracted significant interest from both industry and academia due to their enhanced capabilities in visual generation, surpassing the performance of traditional diffusion models that employ U-Net. However, the improved performance of DiTs comes at the expense of higher parameter counts and implementation costs, which significantly limits their deployment on resource-constrained devices like mobile phones. We propose DiTAS, a data-free post-training quantization (PTQ) method for efficient DiT inference. DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations, leading to much lower quantization error under extremely low bitwidth. To further enhance the performance of the quantized DiT, we adopt the layer-wise grid search strategy to optimize the smoothing factor. Moreover, we integrate a training-free LoRA module for weight quantization, leveraging alternating optimization to minimize quantization errors without additional fine-tuning. Experimental results demonstrate that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model. Code is available at https://github.com/DZY122/DiTAS
AB - Diffusion Transformers (DiTs) have recently attracted significant interest from both industry and academia due to their enhanced capabilities in visual generation, surpassing the performance of traditional diffusion models that employ U-Net. However, the improved performance of DiTs comes at the expense of higher parameter counts and implementation costs, which significantly limits their deployment on resource-constrained devices like mobile phones. We propose DiTAS, a data-free post-training quantization (PTQ) method for efficient DiT inference. DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations, leading to much lower quantization error under extremely low bitwidth. To further enhance the performance of the quantized DiT, we adopt the layer-wise grid search strategy to optimize the smoothing factor. Moreover, we integrate a training-free LoRA module for weight quantization, leveraging alternating optimization to minimize quantization errors without additional fine-tuning. Experimental results demonstrate that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model. Code is available at https://github.com/DZY122/DiTAS
KW - diffusion transformers
KW - post-training quantization
UR - http://www.scopus.com/inward/record.url?scp=105003628747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105003628747&partnerID=8YFLogxK
U2 - 10.1109/WACV61041.2025.00452
DO - 10.1109/WACV61041.2025.00452
M3 - Conference contribution
AN - SCOPUS:105003628747
T3 - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
SP - 4606
EP - 4615
BT - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 February 2025 through 4 March 2025
ER -