2024 IEEE International Conference on Big Data (BigData)
Download PDF

Abstract

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NYSACT, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NYSACT leverages an eigenvalue-shifted Nyström method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NYSACT not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles