Learning Rate Scheduler
deberta_v3_large_lr_scheduler(model, layer_low_threshold=195, layer_middle_threshold=323, head_param_start=390, base_lr=2e-05, head_lr=0.0001, wd=0.01)
DeBERTa-v3 large layer-wise lr scheduler.
Reference : https://github.com/gilfernandes/commonlit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module
|
nn.Module. model. based on Huggingface Transformers. |
required |
layer_low_threshold |
int
|
int. start of the 12 layers. |
195
|
layer_middle_threshold |
int
|
int. end of the 24 layers. |
323
|
head_param_start |
int
|
int. where the backbone ends (head starts). |
390
|
base_lr |
float
|
float. base lr. |
2e-05
|
head_lr |
float
|
float. head_lr. |
0.0001
|
wd |
float
|
float. weight decay. |
0.01
|
Source code in pytorch_optimizer/lr_scheduler/experimental/deberta_v3_lr_scheduler.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
get_chebyshev_lr(lr, epoch, num_epochs, is_warmup=False)
Get chebyshev learning rate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lr |
float
|
float. learning rate. |
required |
epoch |
int
|
int. current epochs. |
required |
num_epochs |
int
|
int. number of total epochs. |
required |
is_warmup |
bool
|
bool. whether warm-up stage or not. |
False
|
Source code in pytorch_optimizer/lr_scheduler/chebyshev.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
CosineAnnealingWarmupRestarts
Bases: _LRScheduler
CosineAnnealingWarmupRestarts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer |
OPTIMIZER
|
Optimizer. wrapped optimizer instance. |
required |
first_cycle_steps |
int
|
int. first cycle step size. |
required |
cycle_mult |
float
|
float. cycle steps magnification. |
1.0
|
max_lr |
float
|
float. |
0.0001
|
min_lr |
float
|
float. |
1e-06
|
warmup_steps |
int
|
int. number of warmup steps. |
0
|
gamma |
float
|
float. decrease rate of lr by cycle. |
0.9
|
last_epoch |
int
|
int. step size of the current cycle. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/cosine_anealing.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
LinearScheduler
Bases: BaseLinearWarmupScheduler
Linear LR Scheduler w/ linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
8 9 10 11 12 13 14 |
|
CosineScheduler
Bases: BaseLinearWarmupScheduler
Cosine LR Scheduler w/ linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
17 18 19 20 21 22 |
|
PolyScheduler
Bases: BaseLinearWarmupScheduler
Poly LR Scheduler.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
poly_order |
float
|
float. lr scheduler decreases with steps. |
0.5
|
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
ProportionScheduler
ProportionScheduler (Rho Scheduler of GSAM).
This scheduler outputs a value that evolves proportional to lr_scheduler.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lr_scheduler |
learning rate scheduler. |
required | |
max_lr |
float
|
float. maximum lr. |
required |
min_lr |
float
|
float. minimum lr. |
0.0
|
max_value |
float
|
float. maximum of rho. |
2.0
|
min_value |
float
|
float. minimum of rho. |
2.0
|
Source code in pytorch_optimizer/lr_scheduler/proportion.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
REXScheduler
Bases: _LRScheduler
Revisiting Budgeted Training with an Improved Schedule.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer |
OPTIMIZER
|
Optimizer. wrapped optimizer instance. |
required |
total_steps |
int
|
int. number of steps to optimize. |
required |
max_lr |
float
|
float. max lr. |
1.0
|
min_lr |
float
|
float. min lr. |
0.0
|
Source code in pytorch_optimizer/lr_scheduler/rex.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|