V3.3.1
Change Log
Feature
- Support
Cautiousvariant toAdaShiftoptimizer. (#310) - Save the state of the
Lookaheadoptimizer too. (#310) - Implement
APOLLOoptimizer. (#311, #312) * SGD-like Memory, AdamW-level Performance - Rename the
Apollo(An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization) optimizer name toApolloDQNnot to overlap with the new optimizer nameAPOLLO. (#312) - Implement
MARSoptimizer. (#313, #314) * Unleashing the Power of Variance Reduction for Training Large Models - Support
Cautiousvariant toMARSoptimizer. (#314)
Bug
- Fix
bias_correctioninAdamGoptimizer. (#305, #308) - Fix a potential bug when loading the state for
Lookaheadoptimizer. (#306, #310)
Docs
- Add more visualizations. (#310, #314)
Contributions
thanks to @Vectorrent