V3.3.2
Change Log
Feature
- Implement
SGDSaIoptimizer. (#315, #316) * No More Adam: Learning Rate Scaling at Initialization is All You Need
Bug
- Clone
exp_avgbefore callingapply_cautiousnot to maskexp_avg. (#316)
SGDSaI optimizer. (#315, #316)
* No More Adam: Learning Rate Scaling at Initialization is All You Need exp_avg before calling apply_cautious not to mask exp_avg. (#316)