V3.3.3
Change Log
Feature
- Implement
Gramsoptimizer. (#317, #318) * Grams: Gradient Descent with Adaptive Momentum Scaling - Support
stable_adamwvariant forADOPTandAdEMAMixoptimizer. (#321) *optimizer = ADOPT(model.parameters(), ..., stable_adamw=True) - Implement an experimental optimizer
Ranger25(not tested). (#321) * mixingADOPT + AdEMAMix + StableAdamW + Cautious + RAdamoptimizers. - Implement
OrthoGradoptimizer. (#321) * Grokking at the Edge of Numerical Stability - Support
Adam-Atan2feature forProdigyoptimizer whenepsis None. (#321) * Scaling Exponents Across Parameterizations and Optimizers