Skip to content

V3.2.0

Change Log

Feature

  • Implement SOAP optimizer. (#275) * SOAP: Improving and Stabilizing Shampoo using Adam
  • Support AdEMAMix variants. (#276) * bnb_ademamix8bit, bnb_ademamix32bit, bnb_paged_ademamix8bit, bnb_paged_ademamix32bit
  • Support 8/4bit, fp8 optimizers. (#208, #281) * torchao_adamw8bit, torchao_adamw4bit, torchao_adamwfp8.
  • Support a module-name-level (e.g. LayerNorm) weight decay exclusion for get_optimizer_parameters. (#282, #283)
  • Implement CPUOffloadOptimizer, which offloads optimizer to CPU for single-GPU training. (#284)
  • Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.

Bug

  • Fix should_grokfast condition when initialization. (#279, #280)

Contributions

thanks to @Vectorrent