Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SCD);however, K-FAC's larger memory footprint hin...
详细信息
ISBN:
(纸本)9781450384421
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SCD);however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second -order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase sea lability. We quantify the tradeoffs between' memory arid communication cost and evaluate KAISA on large models, including ResNet-50, Mask R-CNN, U-Net, arid BERT, on up to 128 NVIDIA A100 CPUs. Compared to the original optimizers, KAISA converges 18.1-36.3% faster across applications with the same global batch size. Under a fixed memory budget, KAISA converges 32.5% and 41.6% faster in ResNel-50 and BERT-Large, respectively. KAISA can balance memory arid communication to achieve scaling efficiency equal to or better than the baseline optimizers.
暂无评论