fts¶
Classes
This callback enables flexible, multi-phase, scheduled fine-tuning of foundation models. |
Fine-Tuning Scheduler.
Used to implement flexible fine-tuning training schedules
- class finetuning_scheduler.fts.FinetuningScheduler(ft_schedule=None, max_depth=-1, base_max_lr=1e-05, restore_best=True, gen_ft_sched_only=False, epoch_transitions_only=False, reinit_optim_cfg=None, reinit_lr_cfg=None, strategy_adapter_cfg=None, custom_strategy_adapters=None, allow_untested=False, apply_lambdas_new_pgs=False, logging_level=20, enforce_phase0_params=True, frozen_bn_track_running_stats=True, log_dir=None)[source]¶
This callback enables flexible, multi-phase, scheduled fine-tuning of foundation models. Gradual unfreezing/thawing can help maximize foundation model knowledge retention while allowing (typically upper layers of) the model to optimally adapt to new tasks during transfer learning.
FinetuningSchedulerorchestrates the gradual unfreezing of models via a fine-tuning schedule that is either implicitly generated (the default) or explicitly provided by the user (more computationally efficient).Fine-tuning phase transitions are driven by
FTSEarlyStoppingcriteria (a multi-phase extension ofEarlyStopping), user-specified epoch transitions or a composition of the two (the default mode). AFinetuningSchedulertraining session completes when the final phase of the schedule has its stopping criteria met. See Early Stopping for more details on that callback’s configuration.Schedule definition is facilitated via
gen_ft_schedule()which dumps a default fine-tuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as desired by the user and subsuquently passed to the callback. Implicit fine-tuning mode generates the default schedule and proceeds to fine-tune according to the generated schedule. Implicit fine-tuning will often be less computationally efficient than explicit fine-tuning but can often serve as a good baseline for subsquent explicit schedule refinement and can marginally outperform many explicit schedules.Example:
import lightning as L from lightning.pytorch.callbacks import FinetuningScheduler trainer = L.Trainer(callbacks=[FinetuningScheduler()])
Note
Currently,
FinetuningSchedulerdoes not support the use of multipleFTSCheckpointorFTSEarlyStoppingcallback instances.Note
While
FinetuningSchedulersupports the use ofZeroRedundancyOptimizer, settingoverlap_with_ddptoTrueis not supported because that optimizer mode only supports a single parameter group.Note
While
FinetuningSchedulersupports the use ofZeroRedundancyOptimizer, settingoverlap_with_ddptoTrueis not supported because that optimizer mode only supports a single parameter group.Arguments used to define and configure a scheduled fine-tuning training session:
- Parameters:
ft_schedule¶ (
str|dict|None) – The fine-tuning schedule to be executed. Usually will be a .yaml file path but can also be a properly structured Dict. See Specifying a Fine-Tuning Schedule for the basic schedule format. See LR Scheduler Reinitialization for more complex schedule configurations (including per-phase LR scheduler reinitialization). If a schedule is not provided, will generate and execute a default fine-tuning schedule using the providedLightningModule. See the default schedule. Defaults toNone.max_depth¶ (
int) – Maximum schedule depth to which the defined fine-tuning schedule should be executed. Specifying -1 or an integer > (number of defined schedule layers) will result in the entire fine-tuning schedule being executed. Defaults to -1.base_max_lr¶ (
float) – The default maximum learning rate to use for the parameter groups associated with each scheduled fine-tuning depth if not explicitly specified in the fine-tuning schedule. If overridden toNone, will be set to thelrof the first scheduled fine-tuning depth. Defaults to 1e-5.restore_best¶ (
bool) – IfTrue, restore the best available (defined by theFTSCheckpoint) checkpoint before fine-tuning depth transitions. Defaults toTrue.gen_ft_sched_only¶ (
bool) – IfTrue, generate the default fine-tuning schedule tolog_dir(it will be named after yourLightningModulesubclass with the suffix_ft_schedule.yaml) and exit without training. Typically used to generate a default schedule that will be adjusted by the user before training. Defaults toFalse.epoch_transitions_only¶ (
bool) – IfTrue, use epoch-driven stopping criteria exclusively (rather than composingFTSEarlyStoppingand epoch-driven criteria which is the default). If using this mode, an epoch-driven transition (max_transition_epoch>= 0) must be specified for each phase. If unspecified,max_transition_epochdefaults to -1 for each phase which signals the application ofFTSEarlyStoppingcriteria only. epoch_transitions_only defaults toFalse.reinit_optim_cfg¶ (
dict|None) –An optimizer reinitialization configuration dictionary consisting of at minimum a nested
optimizer_initdictionary with aclass_pathkey specifying the class of the optimizer to be instantiated. Optionally, aninit_argsdictionary of arguments with which to initialize the optimizer may be included. Areinit_lr_cfgconfiguration can also be specified concurrently. By way of example, one could configure this dictionary via theLightningCLIwith the following:reinit_optim_cfg: optimizer_init: class_path: torch.optim.SGD init_args: lr: 1.0e-05 momentum: 0.9 weight_decay: 1.0e-06
reinit_lr_cfg¶ (
dict|None) –A lr scheduler reinitialization configuration dictionary consisting of at minimum a nested
lr_scheduler_initdictionary with aclass_pathkey specifying the class of the lr scheduler to be instantiated. Optionally, aninit_argsdictionary of arguments to initialize the lr scheduler with may be included. Additionally, one may optionally include arguments to pass to PyTorch Lightning’s lr scheduler configurationLRSchedulerConfigin thepl_lrs_cfgdictionary. Areinit_optim_cfgconfiguration can also be specified concurrently. By way of example, one could configure this dictionary via theLightningCLIwith the following:reinit_lr_cfg: lr_scheduler_init: class_path: torch.optim.lr_scheduler.StepLR init_args: step_size: 1 gamma: 0.7 pl_lrs_cfg: interval: epoch frequency: 1 name: Implicit_Reinit_LR_Scheduler use_current_optimizer_pg_lrs: true
If
True, allows the use of custom or unsupported training strategies and lr schedulers (e.g.single_tpu,MyCustomStrategy,MyCustomLRScheduler) . Defaults toFalse.Note
Custom or officially unsupported strategies and lr schedulers can be used by setting
allow_untestedtoTrue.Some officially unsupported strategies may work unaltered and are only unsupported due to the
Fine-Tuning Schedulerproject’s lack of CI/testing resources for that strategy (e.g.single_tpu).Most unsupported strategies and schedulers, however, are currently unsupported because they require varying degrees of modification to be compatible.
For instance, with respect to strategies,
deepspeedwill require aStrategyAdaptersimilar to the one written forFSDP(FSDPStrategyAdapter) to be written before support can be added (PRs welcome!), whiletpu_spawnwould require an override of the current broadcast method to include python objects.Regarding lr schedulers,
ChainedSchedulerandSequentialLRare examples of schedulers not currently supported due to the configuration complexity and semantic conflicts supporting them would introduce. If a supported torch lr scheduler does not meet your requirements, one can always subclass a supported lr scheduler and modify it as required (e.g.LambdaLRis especially useful for this).strategy_adapter_cfg¶ (
dict|None) – A configuration dictionary that will be applied to theStrategyAdapterassociated with the current trainingStrategy. See the relevantStrategyAdapterdocumentation for strategy-specific configuration options. Defaults to None.custom_strategy_adapters¶ (
dict[str,str] |None) – A dictionary mapping PyTorch Lightning strategy flags (canonical strategy names like"single_device","auto","ddp", etc.) to strategy adapter references. Multiplestrategy_flagkeys can be associated with the same adapter. The adapter reference can be: (1) an entry point name registered underfinetuning_scheduler.strategy_adapters(see Strategy Adapter Entry Points) (2) a fully qualifiedStrategyAdaptersubclass path in the format"module.path:ClassName"or (3) a fully qualified dot path in the format"module.path.ClassName". This is an experimental feature that is subject to change. Defaults to None.apply_lambdas_new_pgs¶ (
bool) – IfTrue, applies most recent lambda inlr_lambdaslist to newly added optimizer groups for lr schedulers that have alr_lambdasattribute. Note this option only applies to phases without reinitialized lr schedulers. Phases with defined lr scheduler reinitialization configs will always apply the specified lambdas. Defaults toFalse.logging_level¶ (
int) – Sets the logging level forFinetuningScheduler. Defaults toINFO.enforce_phase0_params¶ (
bool) – WhetherFinetuningSchedulerwill reconfigure the user-configured optimizer (configured via configure_optimizers) to optimize the parameters (and only those parameters) scheduled to be optimized in phase 0 of the current fine-tuning schedule. Reconfiguration will only take place if FTS discovers the set of parameters to be initially thawed and present in the optimizer differs from the parameters specified in phase 0. Only the parameters included in the optimizer are affected; the choice of optimizer, lr_scheduler etc. remains unaltered. Defaults toTrue.frozen_bn_track_running_stats¶ (
bool) – When freezingtorch.nn.modules.batchnorm._BatchNormlayers, whetherFinetuningSchedulershould setBatchNormtrack_running_statstoTrue. Setting this toTrueoverrides the default Lightning behavior that setsBatchNormtrack_running_statstoFalsewhen freezingBatchNormlayers. Defaults toTrue.log_dir¶ (
str|PathLike|None) – Directory to use forFinetuningSchedulerartifacts. Defaults totrainer.log_dirifNoneortrainer.default_root_diriftrainer.log_diris alsoNone.
- _fts_state¶
The internal
FinetuningSchedulerstate.
- strategy_adapter_cfg¶
A configuration dictionary that will be applied to the
StrategyAdapterassociated with the current trainingStrategy.
- epoch_transitions_only¶
Whether to use epoch-driven stopping criteria exclusively.
- base_max_lr¶
The default maximum learning rate to use for the parameter groups associated with each scheduled fine-tuning depth if not explicitly specified in the fine-tuning schedule. If overridden to
None, will be set to thelrof the first scheduled fine-tuning depth. Defaults to 1e-5.
- freeze_before_training(pl_module)[source]¶
Freezes all model parameters so that parameter subsets can be subsequently thawed according to the fine- tuning schedule.
- Parameters:
pl_module¶ (
LightningModule) – The targetLightningModuleto freeze parameters of- Return type:
- load_state_dict(state_dict)[source]¶
After loading a checkpoint, load the saved
FinetuningSchedulercallback state and update the current callback state accordingly.
- on_before_zero_grad(trainer, pl_module, optimizer)[source]¶
Afer the latest optimizer step, update the
_fts_state, incrementing the global fine-tuning steps taken.- Parameters:
pl_module¶ (
LightningModule) – TheLightningModuleobjectoptimizer¶ (
ParamGroupAddable) – The supported optimizer instance to which parameter groups will be configured and added.
- Return type:
- on_fit_start(trainer, pl_module)[source]¶
Before beginning training, ensure an optimizer configuration supported by
FinetuningScheduleris present.- Parameters:
pl_module¶ (
LightningModule) – TheLightningModuleobject
- Raises:
MisconfigurationException – If more than 1 optimizers are configured indicates a configuration error
- Return type:
- on_train_end(trainer, pl_module)[source]¶
Synchronize internal
_fts_stateon end of training to ensure final training state is consistent with epoch semantics.- Parameters:
pl_module¶ (
LightningModule) – TheLightningModuleobject
- Return type:
- on_train_epoch_start(trainer, pl_module)[source]¶
Before beginning a training epoch, configure the internal
_fts_state, prepare the next scheduled fine-tuning level and store the updated optimizer configuration before continuing training.- Parameters:
pl_module¶ (
LightningModule) – TheLightningModuleobject
- Return type:
- restore_best_ckpt()[source]¶
Restore the current best model checkpoint, according to
best_model_path- Return type:
- setup(trainer, pl_module, stage)[source]¶
Validate a compatible
Strategystrategy is being used and ensure allFinetuningSchedulercallback dependencies are met. If a valid configuration is present, then either dump the default fine-tuning schedule OR 1. configure theFTSEarlyStoppingcallback (if relevant) 2. initialize the_fts_state3. freeze the targetLightningModuleparameters Finally, initialize theFinetuningSchedulertraining session in the training environment.- Parameters:
pl_module¶ (
LightningModule) – TheLightningModuleobjectstage¶ (
str) – TheRunningStage.{SANITY_CHECKING,TRAINING,VALIDATING}. Defaults to None.
- Raises:
SystemExit – Gracefully exit before training if only generating and not executing a fine-tuning schedule.
- Return type:
- should_transition(trainer)[source]¶
Phase transition logic is contingent on whether we are composing
FTSEarlyStoppingcriteria with epoch-driven transition constraints or exclusively using epoch-driven transition scheduling. (i.e.,epoch_transitions_onlyisTrue)
- state_dict()[source]¶
Before saving a checkpoint, add the
FinetuningSchedulercallback state to be saved.- Returns:
- The
FinetuningSchedulercallback state dictionary that will be added to the checkpoint
- The
- Return type:
- step()[source]¶
Prepare and execute the next scheduled fine-tuning level 1. Restore the current best model checkpoint if appropriate 2. Thaw model parameters according the the defined schedule 3. Synchronize the states of
FitLoopand_fts_state:rtype:NoneNote
The
FinetuningSchedulercallback initially only supports single-schedule/optimizer fine-tuning configurations
- step_pg(optimizer, depth, depth_sync=True, pre_reinit_state=None)[source]¶
Configure optimizer parameter groups for the next scheduled fine-tuning level, adding parameter groups beyond the restored optimizer state up to
current_depthand reinitializing the optimizer and/or learning rate scheduler as configured.- Parameters:
optimizer¶ (
ParamGroupAddable) – The supported optimizer instance to which parameter groups will be configured and added.depth¶ (
int) – The maximum index of the fine-tuning schedule for which to configure the optimizer parameter groups.depth_sync¶ (
bool) – IfTrue, configure optimizer parameter groups for all depth indices greater than the restored checkpoint. IfFalse, configure groups only for the specified depth. Defaults toTrue.
- Return type:
- property curr_depth¶
Index of the fine-tuning schedule depth currently being trained.
- Returns:
The index of the current fine-tuning training depth
- Return type:
- property depth_remaining¶
Remaining number of fine-tuning training levels in the schedule.
- Returns:
The number of remaining fine-tuning training levels
- Return type:
- property log_dir¶
Directory to used for
FinetuningSchedulerartifacts.- Returns:
The directory to use, falling back to
trainer.log_dirifFinetuningScheduler._log_diris not set, andtrainer.default_root_diriftrainer.log_diris alsoNone.- Return type:
str | os.PathLike | None