Pre-training CT-FM
Before you begin, ensure you have downloaded your data as explained in the Data Instructions. It is also a good idea to review the lighter documentation since our training configurations are based on its guidelines.
Pre-training Experiment Configurations
Pre-training configuration files are organized in the experiments directory. Explore the key folders below to find the setup that best meets your needs:
Running the pretraining
After all adjustments have been made, navigate to the root directory of the CT-FM project and execute the following command to begin pre-training:
lighter fit --config=./experiments/fm/base.yaml,\ #(1)!
./experiments/fm/frameworks/intrasample_simclr.yaml,\ #(2)!
./experiments/fm/backbones/segresenc.yaml #(3)!
-
The file establishes the core settings for pre-training the CT-FM model by defining:
- Variables: Core parameters such as voxel spacing.
- Trainer Settings: Parameters including 500 epochs, batch limits, GPU configuration, mixed precision, logging via WandB, and checkpoint callbacks.
- System Settings: The model placeholder, optimizer (AdamW), learning rate scheduler (WarmupCosineSchedule), and dataloader setup for safely handling your dataset.
- Adapters: Methods for batch processing and loss computation.
In essence,
base.yaml
serves as the foundation upon which the entire pre-training process is built.
-
The file configures the self-supervised SimCLR framework used during pre-training. It includes:
- Model & Criterion: Defines the CT-FM model and applies a contrastive loss function with a specified temperature.
- Data Augmentation Pipeline: Implements a series of transformations (such as random crops, flips, and intensity adjustments) to generate multiple augmented views from each input image. This configuration augments the base setup with specialized self-supervised learning components.
-
The file sets up the backbone for the CT-FM model. It includes:
- Backbone Identification: Sets the variable
BACKBONE_NAME
to"SegResNetDS"
. - Architectural Details: Configures the SegResNet encoder (via
monai.networks.nets.segresnet_ds.SegResEncoder
) by specifying parameters like spatial dimensions, input channels, initial filters, and block structures. - Integration with Base Config: Uses shared variable mappings (such as
NUM_FTRS_BY_BACKBONE
) and logger identifiers frombase.yaml
to ensure smooth integration. This configuration provides the essential backbone architecture for complete model training.
- Backbone Identification: Sets the variable
Click on the symbols to learn more about each yaml file
Customization Before Training
Before running the experiment, update your base.yaml
configuration using the guidelines below:
Directory Paths Update
- Set the paths for
save_dir
anddirpath
to your preferred locations for saving logs and checkpoints. - Update the path for
scan_list.pkl
to reflect the file produced during the data preparation phase.
Training Parameter Adjustments
Modify the settings under the trainer:
key (such as the number of GPUs, batch size, and training duration) to align with your system’s resources and experimental needs.
After applying these customizations, execute the pre-training command to initiate the process with your updated configurations.