V3.3.0
Change Log
Feature
- Support
PaLMvariant forScheduleFreeAdamWoptimizer. (#286, #288) * you can use this feature by settinguse_palmtoTrue. - Implement
ADOPToptimizer. (#289, #290) * Modified Adam Can Converge with Any β2 with the Optimal Rate - Implement
FTRLoptimizer. (#291) * Follow The Regularized Leader - Implement
Cautious optimizerfeature. (#294) * Improving Training with One Line of Code * you can use it by settingcautious=TrueforLion,AdaFactorandAdEMAMixoptimizers. - Improve the stability of
ADOPToptimizer. (#294) * Note - Support a new projection type
randomforGaLoreProjector. (#294) - Implement
DeMooptimizer. (#300, #301) * Decoupled Momentum Optimization - Implement
Muonoptimizer. (#302) * MomentUm Orthogonalized by Newton-schulz - Implement
ScheduleFreeRAdamoptimizer. (#304) - Implement
LaPropoptimizer. (#304) * Separating Momentum and Adaptivity in Adam - Support
Cautiousvariant toLaProp,AdamP,Adoptoptimizers. (#304).
Refactor
- Big refactoring, removing direct import from
pytorch_optimizer.*. * I removed some methods not to directly import from it frompytorch_optimzier.*because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers. *pytorch_optimizer.[Shampoo stuff]->pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff]. *shampoo_utilslikeGraft,BlockPartitioner,PreConditioner, etc. You can check the details here. *pytorch_optimizer.GaLoreProjector->pytorch_optimizer.optimizers.galore.GaLoreProjector. *pytorch_optimizer.gradfilter_ema->pytorch_optimizer.optimizers.grokfast.gradfilter_ema. *pytorch_optimizer.gradfilter_ma->pytorch_optimizer.optimizers.grokfast.gradfilter_ma. *pytorch_optimizer.l2_projection->pytorch_optimizer.optimizers.alig.l2_projection. *pytorch_optimizer.flatten_grad->pytorch_optimizer.optimizers.pcgrad.flatten_grad. *pytorch_optimizer.un_flatten_grad->pytorch_optimizer.optimizers.pcgrad.un_flatten_grad. *pytorch_optimizer.reduce_max_except_dim->pytorch_optimizer.optimizers.sm3.reduce_max_except_dim. *pytorch_optimizer.neuron_norm->pytorch_optimizer.optimizers.nero.neuron_norm. *pytorch_optimizer.neuron_mean->pytorch_optimizer.optimizers.nero.neuron_mean.
Docs
- Add more visualizations. (#297)
Bug
- Add optimizer parameter to
PolySchedulerconstructor. (#295)
Contributions
thanks to @tanganke