Learning Rate Scheduler
deberta_v3_large_lr_scheduler(model, layer_low_threshold=195, layer_middle_threshold=323, head_param_start=390, base_lr=2e-05, head_lr=0.0001, wd=0.01)
DeBERTa-v3 large layer-wise lr scheduler.
Reference : https://github.com/gilfernandes/commonlit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
nn.Module. model. based on Huggingface Transformers. |
required |
layer_low_threshold
|
int
|
int. start of the 12 layers. |
195
|
layer_middle_threshold
|
int
|
int. end of the 24 layers. |
323
|
head_param_start
|
int
|
int. where the backbone ends (head starts). |
390
|
base_lr
|
float
|
float. base lr. |
2e-05
|
head_lr
|
float
|
float. head_lr. |
0.0001
|
wd
|
float
|
float. weight decay. |
0.01
|
Source code in pytorch_optimizer/lr_scheduler/experimental/deberta_v3_lr_scheduler.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
get_chebyshev_schedule(optimizer, num_epochs, is_warmup=False, last_epoch=-1)
Get chebyshev learning rate scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Optimizer. the optimizer for which to schedule the learning rate. |
required |
num_epochs
|
int
|
int. number of total epochs. |
required |
is_warmup
|
bool
|
bool. whether warm-up stage or not. |
False
|
last_epoch
|
int
|
int. the index of the last epoch when resuming training. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/chebyshev.py
72 73 74 75 76 77 78 79 80 81 82 83 84 | |
get_wsd_schedule(optimizer, num_warmup_steps, num_stable_steps, num_decay_steps, min_lr_ratio=0.0, num_cycles=0.5, cooldown_type='1-sqrt', last_epoch=-1)
Get Warmup-Stable-Decay learning rate scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Optimizer. the optimizer for which to schedule the learning rate. |
required |
num_warmup_steps
|
int
|
int. the number of warmup steps. |
required |
num_stable_steps
|
int
|
int. the number of stable steps. |
required |
num_decay_steps
|
int
|
int. the number of decay steps. |
required |
min_lr_ratio
|
float
|
float. the minimum learning rate as a ratio of the initial learning rate. |
0.0
|
num_cycles
|
float
|
float. the number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine) |
0.5
|
cooldown_type
|
COOLDOWN_TYPE
|
COOLDOWN_TYPE. cooldown type of the learning rate scheduler. |
'1-sqrt'
|
last_epoch
|
int
|
int. the index of the last epoch when resuming training. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/wsd.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
CosineAnnealingWarmupRestarts
Bases: LRScheduler
CosineAnnealingWarmupRestarts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Optimizer. wrapped optimizer instance. |
required |
first_cycle_steps
|
int
|
int. first cycle step size. |
required |
cycle_mult
|
float
|
float. cycle steps magnification. |
1.0
|
max_lr
|
float
|
float. |
0.0001
|
min_lr
|
float
|
float. |
1e-06
|
warmup_steps
|
int
|
int. number of warmup steps. |
0
|
gamma
|
float
|
float. decrease rate of lr by cycle. |
0.9
|
last_epoch
|
int
|
int. step size of the current cycle. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/cosine_anealing.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
LinearScheduler
Bases: BaseLinearWarmupScheduler
Linear LR Scheduler w/ linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
8 9 10 11 12 13 14 | |
CosineScheduler
Bases: BaseLinearWarmupScheduler
Cosine LR Scheduler w/ linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
17 18 19 20 21 22 | |
PolyScheduler
Bases: BaseLinearWarmupScheduler
Poly LR Scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poly_order
|
float
|
float. lr scheduler decreases with steps. |
0.5
|
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
ProportionScheduler
ProportionScheduler (Rho Scheduler of GSAM).
This scheduler outputs a value that evolves proportional to lr_scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lr_scheduler
|
learning rate scheduler. |
required | |
max_lr
|
float
|
float. maximum lr. |
required |
min_lr
|
float
|
float. minimum lr. |
0.0
|
max_value
|
float
|
float. maximum of rho. |
2.0
|
min_value
|
float
|
float. minimum of rho. |
2.0
|
Source code in pytorch_optimizer/lr_scheduler/proportion.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
REXScheduler
Bases: LRScheduler
Revisiting Budgeted Training with an Improved Schedule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Optimizer. wrapped optimizer instance. |
required |
total_steps
|
int
|
int. number of steps to optimize. |
required |
max_lr
|
float
|
float. max lr. |
1.0
|
min_lr
|
float
|
float. min lr. |
0.0
|
Source code in pytorch_optimizer/lr_scheduler/rex.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |