fts¶

Classes

FinetuningScheduler

This callback enables flexible, multi-phase, scheduled fine-tuning of foundation models.

Fine-Tuning Scheduler.

Used to implement flexible fine-tuning training schedules

class finetuning_scheduler.fts.FinetuningScheduler(ft_schedule=None, max_depth=-1, base_max_lr=1e-05, restore_best=True, gen_ft_sched_only=False, epoch_transitions_only=False, reinit_optim_cfg=None, reinit_lr_cfg=None, strategy_adapter_cfg=None, custom_strategy_adapters=None, allow_untested=False, apply_lambdas_new_pgs=False, logging_level=20, enforce_phase0_params=True, frozen_bn_track_running_stats=True, log_dir=None)[source]¶

This callback enables flexible, multi-phase, scheduled fine-tuning of foundation models. Gradual unfreezing/thawing can help maximize foundation model knowledge retention while allowing (typically upper layers of) the model to optimally adapt to new tasks during transfer learning. FinetuningScheduler orchestrates the gradual unfreezing of models via a fine-tuning schedule that is either implicitly generated (the default) or explicitly provided by the user (more computationally efficient).

Fine-tuning phase transitions are driven by FTSEarlyStopping criteria (a multi-phase extension of EarlyStopping), user-specified epoch transitions or a composition of the two (the default mode). A FinetuningScheduler training session completes when the final phase of the schedule has its stopping criteria met. See Early Stopping for more details on that callback’s configuration.

Schedule definition is facilitated via gen_ft_schedule() which dumps a default fine-tuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as desired by the user and subsuquently passed to the callback. Implicit fine-tuning mode generates the default schedule and proceeds to fine-tune according to the generated schedule. Implicit fine-tuning will often be less computationally efficient than explicit fine-tuning but can often serve as a good baseline for subsquent explicit schedule refinement and can marginally outperform many explicit schedules.

Example:

import lightning as L
from lightning.pytorch.callbacks import FinetuningScheduler
trainer = L.Trainer(callbacks=[FinetuningScheduler()])

Note

Currently, FinetuningScheduler does not support the use of multiple FTSCheckpoint or FTSEarlyStopping callback instances.

Note

While FinetuningScheduler supports the use of ZeroRedundancyOptimizer, setting overlap_with_ddp to True is not supported because that optimizer mode only supports a single parameter group.

Note

While FinetuningScheduler supports the use of ZeroRedundancyOptimizer, setting overlap_with_ddp to True is not supported because that optimizer mode only supports a single parameter group.

Arguments used to define and configure a scheduled fine-tuning training session:

Parameters:

ft_schedule¶ (str | dict | None) – The fine-tuning schedule to be executed. Usually will be a .yaml file path but can also be a properly structured Dict. See Specifying a Fine-Tuning Schedule for the basic schedule format. See LR Scheduler Reinitialization for more complex schedule configurations (including per-phase LR scheduler reinitialization). If a schedule is not provided, will generate and execute a default fine-tuning schedule using the provided LightningModule. See the default schedule. Defaults to None.
max_depth¶ (int) – Maximum schedule depth to which the defined fine-tuning schedule should be executed. Specifying -1 or an integer > (number of defined schedule layers) will result in the entire fine-tuning schedule being executed. Defaults to -1.
base_max_lr¶ (float) – The default maximum learning rate to use for the parameter groups associated with each scheduled fine-tuning depth if not explicitly specified in the fine-tuning schedule. If overridden to None, will be set to the lr of the first scheduled fine-tuning depth. Defaults to 1e-5.
restore_best¶ (bool) – If True, restore the best available (defined by the FTSCheckpoint) checkpoint before fine-tuning depth transitions. Defaults to True.
gen_ft_sched_only¶ (bool) – If True, generate the default fine-tuning schedule to log_dir (it will be named after your LightningModule subclass with the suffix _ft_schedule.yaml) and exit without training. Typically used to generate a default schedule that will be adjusted by the user before training. Defaults to False.
epoch_transitions_only¶ (bool) – If True, use epoch-driven stopping criteria exclusively (rather than composing FTSEarlyStopping and epoch-driven criteria which is the default). If using this mode, an epoch-driven transition (max_transition_epoch >= 0) must be specified for each phase. If unspecified, max_transition_epoch defaults to -1 for each phase which signals the application of FTSEarlyStopping criteria only. epoch_transitions_only defaults to False.
reinit_optim_cfg¶ (dict | None) –
An optimizer reinitialization configuration dictionary consisting of at minimum a nested optimizer_init dictionary with a class_path key specifying the class of the optimizer to be instantiated. Optionally, an init_args dictionary of arguments with which to initialize the optimizer may be included. A reinit_lr_cfg configuration can also be specified concurrently. By way of example, one could configure this dictionary via the LightningCLI with the following:
```
reinit_optim_cfg:
    optimizer_init:
        class_path: torch.optim.SGD
        init_args:
            lr: 1.0e-05
            momentum: 0.9
            weight_decay: 1.0e-06
```
reinit_lr_cfg¶ (dict | None) –
A lr scheduler reinitialization configuration dictionary consisting of at minimum a nested lr_scheduler_init dictionary with a class_path key specifying the class of the lr scheduler to be instantiated. Optionally, an init_args dictionary of arguments to initialize the lr scheduler with may be included. Additionally, one may optionally include arguments to pass to PyTorch Lightning’s lr scheduler configuration LRSchedulerConfig in the pl_lrs_cfg dictionary. A reinit_optim_cfg configuration can also be specified concurrently. By way of example, one could configure this dictionary via the LightningCLI with the following:
```
reinit_lr_cfg:
    lr_scheduler_init:
        class_path: torch.optim.lr_scheduler.StepLR
        init_args:
            step_size: 1
            gamma: 0.7
        pl_lrs_cfg:
            interval: epoch
            frequency: 1
            name: Implicit_Reinit_LR_Scheduler
        use_current_optimizer_pg_lrs: true
```
allow_untested¶ (bool) –
If True, allows the use of custom or unsupported training strategies and lr schedulers (e.g. single_tpu, MyCustomStrategy, MyCustomLRScheduler) . Defaults to False.

Note

Custom or officially unsupported strategies and lr schedulers can be used by setting allow_untested to True.

Some officially unsupported strategies may work unaltered and are only unsupported due to the Fine-Tuning Scheduler project’s lack of CI/testing resources for that strategy (e.g. single_tpu).

Most unsupported strategies and schedulers, however, are currently unsupported because they require varying degrees of modification to be compatible.

For instance, with respect to strategies, deepspeed will require a StrategyAdapter similar to the one written for FSDP (FSDPStrategyAdapter) to be written before support can be added (PRs welcome!), while tpu_spawn would require an override of the current broadcast method to include python objects.

Regarding lr schedulers, ChainedScheduler and SequentialLR are examples of schedulers not currently supported due to the configuration complexity and semantic conflicts supporting them would introduce. If a supported torch lr scheduler does not meet your requirements, one can always subclass a supported lr scheduler and modify it as required (e.g. LambdaLR is especially useful for this).
strategy_adapter_cfg¶ (dict | None) – A configuration dictionary that will be applied to the StrategyAdapter associated with the current training Strategy. See the relevant StrategyAdapter documentation for strategy-specific configuration options. Defaults to None.
custom_strategy_adapters¶ (dict[str, str] | None) – A dictionary mapping PyTorch Lightning strategy flags (canonical strategy names like "single_device", "auto", "ddp", etc.) to strategy adapter references. Multiple strategy_flag keys can be associated with the same adapter. The adapter reference can be: (1) an entry point name registered under finetuning_scheduler.strategy_adapters (see Strategy Adapter Entry Points) (2) a fully qualified StrategyAdapter subclass path in the format "module.path:ClassName" or (3) a fully qualified dot path in the format "module.path.ClassName". This is an experimental feature that is subject to change. Defaults to None.
apply_lambdas_new_pgs¶ (bool) – If True, applies most recent lambda in lr_lambdas list to newly added optimizer groups for lr schedulers that have a lr_lambdas attribute. Note this option only applies to phases without reinitialized lr schedulers. Phases with defined lr scheduler reinitialization configs will always apply the specified lambdas. Defaults to False.
logging_level¶ (int) – Sets the logging level for FinetuningScheduler. Defaults to INFO.
enforce_phase0_params¶ (bool) – Whether FinetuningScheduler will reconfigure the user-configured optimizer (configured via configure_optimizers) to optimize the parameters (and only those parameters) scheduled to be optimized in phase 0 of the current fine-tuning schedule. Reconfiguration will only take place if FTS discovers the set of parameters to be initially thawed and present in the optimizer differs from the parameters specified in phase 0. Only the parameters included in the optimizer are affected; the choice of optimizer, lr_scheduler etc. remains unaltered. Defaults to True.
frozen_bn_track_running_stats¶ (bool) – When freezing torch.nn.modules.batchnorm._BatchNorm layers, whether FinetuningScheduler should set BatchNorm track_running_stats to True. Setting this to True overrides the default Lightning behavior that sets BatchNorm track_running_stats to False when freezing BatchNorm layers. Defaults to True.
log_dir¶ (str | PathLike | None) – Directory to use for FinetuningScheduler artifacts. Defaults to trainer.log_dir if None or trainer.default_root_dir if trainer.log_dir is also None.

_fts_state¶: The internal FinetuningScheduler state.

strategy_adapter_cfg¶: A configuration dictionary that will be applied to the StrategyAdapter associated with the current training Strategy.

epoch_transitions_only¶: Whether to use epoch-driven stopping criteria exclusively.

base_max_lr¶: The default maximum learning rate to use for the parameter groups associated with each scheduled fine-tuning depth if not explicitly specified in the fine-tuning schedule. If overridden to None, will be set to the lr of the first scheduled fine-tuning depth. Defaults to 1e-5.

freeze_before_training(pl_module)[source]¶

Freezes all model parameters so that parameter subsets can be subsequently thawed according to the fine- tuning schedule.

Parameters:: pl_module¶ (LightningModule) – The target LightningModule to freeze parameters of
Return type:: None

load_state_dict(state_dict)[source]¶

After loading a checkpoint, load the saved FinetuningScheduler callback state and update the current callback state accordingly.

Parameters:: state_dict¶ (dict[str, Any]) – The FinetuningScheduler callback state dictionary that will be loaded from the checkpoint
Return type:: None

on_before_zero_grad(trainer, pl_module, optimizer)[source]¶

Afer the latest optimizer step, update the _fts_state, incrementing the global fine-tuning steps taken.

Parameters:

trainer¶ (Trainer) – The Trainer object
pl_module¶ (LightningModule) – The LightningModule object
optimizer¶ (ParamGroupAddable) – The supported optimizer instance to which parameter groups will be configured and added.

Return type:

None

on_fit_start(trainer, pl_module)[source]¶

Before beginning training, ensure an optimizer configuration supported by FinetuningScheduler is present.

Parameters:

trainer¶ (Trainer) – The Trainer object
pl_module¶ (LightningModule) – The LightningModule object

Raises:

MisconfigurationException – If more than 1 optimizers are configured indicates a configuration error

Return type:

None

on_train_end(trainer, pl_module)[source]¶

Synchronize internal _fts_state on end of training to ensure final training state is consistent with epoch semantics.

Parameters:

trainer¶ (Trainer) – The Trainer object
pl_module¶ (LightningModule) – The LightningModule object

Return type:

None

on_train_epoch_start(trainer, pl_module)[source]¶

Before beginning a training epoch, configure the internal _fts_state, prepare the next scheduled fine-tuning level and store the updated optimizer configuration before continuing training.

Parameters:

trainer¶ (Trainer) – The Trainer object
pl_module¶ (LightningModule) – The LightningModule object

Return type:

None

restore_best_ckpt()[source]¶

Restore the current best model checkpoint, according to best_model_path

Return type:: None

setup(trainer, pl_module, stage)[source]¶

Validate a compatible Strategy strategy is being used and ensure all FinetuningScheduler callback dependencies are met. If a valid configuration is present, then either dump the default fine-tuning schedule OR 1. configure the FTSEarlyStopping callback (if relevant) 2. initialize the _fts_state 3. freeze the target LightningModule parameters Finally, initialize the FinetuningScheduler training session in the training environment.

Parameters:

trainer¶ (Trainer) – The Trainer object
pl_module¶ (LightningModule) – The LightningModule object
stage¶ (str) – The RunningStage.{SANITY_CHECKING,TRAINING,VALIDATING}. Defaults to None.

Raises:

SystemExit – Gracefully exit before training if only generating and not executing a fine-tuning schedule.

Return type:

None

should_transition(trainer)[source]¶

Phase transition logic is contingent on whether we are composing FTSEarlyStopping criteria with epoch-driven transition constraints or exclusively using epoch-driven transition scheduling. (i.e., epoch_transitions_only is True)

Parameters:: trainer¶ (Trainer) – The Trainer object
Return type:: bool

state_dict()[source]¶

Before saving a checkpoint, add the FinetuningScheduler callback state to be saved.

Returns:

The FinetuningScheduler callback state dictionary: that will be added to the checkpoint

Return type:

dict[str, Any]

step()[source]¶: Prepare and execute the next scheduled fine-tuning level 1. Restore the current best model checkpoint if appropriate 2. Thaw model parameters according the the defined schedule 3. Synchronize the states of FitLoop and _fts_state :rtype: None

Note

The FinetuningScheduler callback initially only supports single-schedule/optimizer fine-tuning configurations

step_pg(optimizer, depth, depth_sync=True, pre_reinit_state=None)[source]¶

Configure optimizer parameter groups for the next scheduled fine-tuning level, adding parameter groups beyond the restored optimizer state up to current_depth and reinitializing the optimizer and/or learning rate scheduler as configured.

Parameters:

optimizer¶ (ParamGroupAddable) – The supported optimizer instance to which parameter groups will be configured and added.
depth¶ (int) – The maximum index of the fine-tuning schedule for which to configure the optimizer parameter groups.
depth_sync¶ (bool) – If True, configure optimizer parameter groups for all depth indices greater than the restored checkpoint. If False, configure groups only for the specified depth. Defaults to True.

Return type:

None

property curr_depth¶

Index of the fine-tuning schedule depth currently being trained.

Returns:: The index of the current fine-tuning training depth
Return type:: int

property depth_remaining¶

Remaining number of fine-tuning training levels in the schedule.

Returns:: The number of remaining fine-tuning training levels
Return type:: int

property log_dir¶

Directory to used for FinetuningScheduler artifacts.

Returns:: The directory to use, falling back to trainer.log_dir if FinetuningScheduler._log_dir is not set, and trainer.default_root_dir if trainer.log_dir is also None.
Return type:: str | os.PathLike | None