Learning Rate Scheduler
deberta_v3_large_lr_scheduler(model, layer_low_threshold=195, layer_middle_threshold=323, head_param_start=390, base_lr=2e-05, head_lr=0.0001, wd=0.01)
DeBERTa-v3 large layer-wise learning rate scheduler.
Reference: https://github.com/gilfernandes/commonlit
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Model based on Huggingface Transformers. |
required |
layer_low_threshold
|
int
|
Index where the lower 12 layers start. |
195
|
layer_middle_threshold
|
int
|
Index where the middle 24 layers end. |
323
|
head_param_start
|
int
|
Starting index of the head parameters (end of backbone). |
390
|
base_lr
|
float
|
Base learning rate for backbone layers. |
2e-05
|
head_lr
|
float
|
Learning rate for head layers. |
0.0001
|
wd
|
float
|
Weight decay. |
0.01
|
Source code in pytorch_optimizer/lr_scheduler/experimental/deberta_v3_lr_scheduler.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
get_chebyshev_schedule(optimizer, num_epochs, is_warmup=False, last_epoch=-1)
Get Chebyshev learning rate scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
The optimizer for which to schedule the learning rate. |
required |
num_epochs
|
int
|
Number of total epochs. |
required |
is_warmup
|
bool
|
Whether it is the warm-up stage. |
False
|
last_epoch
|
int
|
The index of the last epoch when resuming training. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/chebyshev.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
get_wsd_schedule(optimizer, num_warmup_steps, num_stable_steps, num_decay_steps, min_lr_ratio=0.0, num_cycles=0.5, cooldown_type='1-sqrt', last_epoch=-1)
Get Warmup-Stable-Decay (WSD) learning rate scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
The optimizer for which to schedule the learning rate. |
required |
num_warmup_steps
|
int
|
The number of warmup steps. |
required |
num_stable_steps
|
int
|
The number of stable steps. |
required |
num_decay_steps
|
int
|
The number of decay steps. |
required |
min_lr_ratio
|
float
|
The minimum learning rate as a ratio of the initial learning rate. |
0.0
|
num_cycles
|
float
|
The number of waves in the cosine schedule (default is a half-cosine decay). |
0.5
|
cooldown_type
|
COOLDOWN_TYPE
|
Cooldown type of the learning rate scheduler. |
'1-sqrt'
|
last_epoch
|
int
|
The index of the last epoch when resuming training. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/wsd.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
CosineAnnealingWarmupRestarts
Bases: LRScheduler
CosineAnnealingWarmupRestarts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Wrapped optimizer instance. |
required |
first_cycle_steps
|
int
|
Number of steps in the first cycle. |
required |
cycle_mult
|
float
|
Cycle steps magnification factor. |
1.0
|
max_lr
|
float
|
Maximum learning rate. |
0.0001
|
min_lr
|
float
|
Minimum learning rate. |
1e-06
|
warmup_steps
|
int
|
Number of warmup steps. |
0
|
gamma
|
float
|
Decrease rate of max learning rate by cycle. |
0.9
|
last_epoch
|
int
|
The index of the last epoch for resuming training. |
-1
|
Source code in pytorch_optimizer/lr_scheduler/cosine_anealing.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
CosineScheduler
Bases: BaseLinearWarmupScheduler
Cosine LR scheduler with linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
17 18 19 20 21 22 | |
get_supported_lr_schedulers(filters=None)
Return list of available lr scheduler names, sorted alphabetically.
:param filters: Optional[Union[str, List[str]]]. wildcard filter string that works with fmatch. if None, it will return the whole list.
Source code in pytorch_optimizer/lr_scheduler/__init__.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
LinearScheduler
Bases: BaseLinearWarmupScheduler
Linear LR scheduler with linear warmup.
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
8 9 10 11 12 13 14 | |
load_lr_scheduler(lr_scheduler_name)
Load learning rate scheduler.
:param lr_scheduler_name: learning rate scheduler name.
Source code in pytorch_optimizer/lr_scheduler/__init__.py
72 73 74 75 76 77 78 79 80 81 82 | |
PolyScheduler
Bases: BaseLinearWarmupScheduler
Poly LR Scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poly_order
|
float
|
lr scheduler decreases with steps. |
0.5
|
Source code in pytorch_optimizer/lr_scheduler/linear_warmup.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
ProportionScheduler
ProportionScheduler (Rho Scheduler of GSAM).
This scheduler outputs a value that evolves proportionally to a given learning rate scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lr_scheduler
|
LRScheduler
|
Learning rate scheduler. |
required |
max_lr
|
float
|
Maximum learning rate. |
required |
min_lr
|
float
|
Minimum learning rate. |
0.0
|
max_value
|
float
|
Maximum value of rho. |
2.0
|
min_value
|
float
|
Minimum value of rho. |
2.0
|
Source code in pytorch_optimizer/lr_scheduler/proportion.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
REXScheduler
Bases: LRScheduler
Revisiting Budgeted Training with an Improved Schedule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Wrapped optimizer instance. |
required |
total_steps
|
int
|
Number of steps to optimize. |
required |
max_lr
|
float
|
Maximum learning rate. |
1.0
|
min_lr
|
float
|
Minimum learning rate. |
0.0
|
Source code in pytorch_optimizer/lr_scheduler/rex.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |