Skip to content

V3.3.1

Change Log

Feature

  • Support Cautious variant to AdaShift optimizer. (#310)
  • Save the state of the Lookahead optimizer too. (#310)
  • Implement APOLLO optimizer. (#311, #312) * SGD-like Memory, AdamW-level Performance
  • Rename the Apollo (An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization) optimizer name to ApolloDQN not to overlap with the new optimizer name APOLLO. (#312)
  • Implement MARS optimizer. (#313, #314) * Unleashing the Power of Variance Reduction for Training Large Models
  • Support Cautious variant to MARS optimizer. (#314)

Bug

  • Fix bias_correction in AdamG optimizer. (#305, #308)
  • Fix a potential bug when loading the state for Lookahead optimizer. (#306, #310)

Docs

  • Add more visualizations. (#310, #314)

Contributions

thanks to @Vectorrent