V3.6.0

Change Log

Implement Fira optimizer. (#376) * Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Implement RACS and Alice optimizers. (#376) * Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Implement VSGD optimizer. (#377, #378) * Variational Stochastic Gradient Descent for Deep Neural Networks
Enable training with complex parameters. (#370, #380) * will raise NoComplexParameterError for unsupported optimizers, due to its design or not-yet-implemented.
Support maximize parameter. (#370, #380) * maximize: maximize the objective with respect to the params, instead of minimizing.
Implement copy_stochastic() method. (#381)

Support 2D< Tensor for RACS and Alice optimizers. (#380)
Remove the auxiliary variants from the default parameters of the optimizers and change the name of the state and parameter. (#380) * use_gc, adanorm, cautious, stable_adamw, and adam_debias will be affected. * You can still use these variants by passing the parameters to **kwargs. * Notably, in case of adanorm variant, you need to pass adanorm (and adanorm_r for r option) parameter(s) to use this variant, and the name of the state will be changed from exp_avg_norm to exp_avg_adanorm.
Refactor reset() to init_group() method in the BaseOptimizer class. (#380)
Refactor SAM optimizer family. (#380)
Gather AdamP, SGDP things into pytorch_optimizer.optimizer.adamp.*. (#381) * pytorch_optimizer.optimizer.sgdp.SGDP to pytorch_optimizer.optimizer.adamp.SGDP * pytorch_optimizer.optimizer.util.projection to pytorch_optimizer.optimizer.adamp.projection * pytorch_optimizer.optimizer.util.cosine_similarity_by_view to pytorch_optimizer.optimizer.adamp.cosine_similarity_by_view
Remove channel_view() and layer_view() from pytorch_optimizer.optimizer.util. (#381)

Fix shape mismatch issues in the Galore projection for reverse_std, right, and full projection types. (#376)