V3.2.0
Change Log
Feature
- Implement
SOAPoptimizer. (#275) * SOAP: Improving and Stabilizing Shampoo using Adam - Support
AdEMAMixvariants. (#276) *bnb_ademamix8bit,bnb_ademamix32bit,bnb_paged_ademamix8bit,bnb_paged_ademamix32bit - Support 8/4bit, fp8 optimizers. (#208, #281)
*
torchao_adamw8bit,torchao_adamw4bit,torchao_adamwfp8. - Support a module-name-level (e.g.
LayerNorm) weight decay exclusion forget_optimizer_parameters. (#282, #283) - Implement
CPUOffloadOptimizer, which offloads optimizer to CPU for single-GPU training. (#284) - Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.
Bug
- Fix
should_grokfastcondition when initialization. (#279, #280)
Contributions
thanks to @Vectorrent