Train Module¶
This module contains functions to train a model from pytable files of segmented nuclei. Before being able to train, you might need to generate a dataset from HoverNet. We provide a docker container to do this since local HoverNet installation can be tricky.
Dataset Generation¶
Structure Your Data Directory¶
Structure your data directory as follows:
└── dir
config.ini
└── slides/
├── slide_1.svs
├── ...
└── slide_n.svs
Generate Dataset¶
To generate the dataset, run the following command:
docker run --gpus all -it -v /path/to/dir/:/HoverFastData petroslk/data_generation_hovernet:latest hoverfast_data_generation -c '/HoverFastData/config.ini'
This should generate two files in the directory called data_train.pytable and data_test.pytable. You can use these to train the model.
Config File for Data Generation¶
Here is an example of what the config.ini file should look like:
[General]
dataset = data
classes = 0, 1
gpu_id = 0
seed = 3981674709
[Dataset_train_test]
test_set_size = 0.2
num_tiles = 10
tile_size = 1024
level = 0
mask_level = 3
mask_th = 0.2
save_tiles = False
rois = False
Explanation:
[General]:
dataset: Name of the dataset.
classes: Classes to segment.
gpu_id: GPU ID to use for data generation.
seed: Random seed for reproducibility.
[Dataset_train_test]:
test_set_size: Proportion of the dataset to use for testing.
num_tiles: Number of tiles to select from each WSI.
tile_size: Size of each tile.
level: Level of the WSI to use.
mask_level: Level of the mask to use.
mask_th: Threshold for the mask.
save_tiles: Whether to save the extracted tiles.
rois: Whether to use ROIs.
Train Parser¶
The train parser allows you to configure various parameters for training the model. Below are the details of each argument:
dataname (positional argument): Dataset name, corresponds to the pytables name under the following format: (dataname)_(phase).pytables.
-o, –outdir: Output directory path for tensorboard and trained model. Default is ./output/.
-p, –dataset_path: Path to the directory that contains the pytables. Default is ./.
-b, –batch_size: Number of workers for the dataloader. Default is 5.
-n, –n_worker: Number of workers for the dataloader. Default is min(batch_size, os.cpu_count()).
-e, –epoch: Number of epochs. Default is 100.
-d, –depth: Depth of the model. Default is 3.
-w, –width: Width of the model. Defines the number of filters in the first layer (2**w) with an exponential growth rate respective to the depth of the model. Default is 4.
Usage Examples¶
Basic Usage¶
This example demonstrates the basic usage of the train command with minimal arguments, using default settings.
HoverFast train dataset_name -p /path/to/dataset -o /path/to/outdir
Explanation:
Dataset: Name of the dataset, which corresponds to the pytables name.
Dataset Path: Directory that contains the pytables.
Output directory: Directory to save TensorBoard logs and trained model.
Custom Batch Size¶
In this example, we specify a custom batch size for the dataloader.
HoverFast train dataset_name -p /path/to/dataset -b 10 -o /path/to/outdir
Explanation:
Dataset: Name of the dataset.
Dataset Path: Directory that contains the pytables.
Batch Size: Set to 10 workers for the dataloader.
Output directory: Directory to save TensorBoard logs and trained model.
Custom Number of Workers¶
Setting a custom number of workers for the dataloader to optimize data loading.
HoverFast train dataset_name -p /path/to/dataset -n 8 -o /path/to/outdir
Explanation:
Dataset: Name of the dataset.
Dataset Path: Directory that contains the pytables.
Number of Workers: Set to 8 workers for the dataloader.
Output directory: Directory to save TensorBoard logs and trained model.
Adjusting Model Depth¶
Adjusting the depth of the model for more complex training scenarios.
HoverFast train dataset_name -p /path/to/dataset -d 5 -o /path/to/outdir
Explanation:
Dataset: Name of the dataset.
Dataset Path: Directory that contains the pytables.
Model Depth: Set the depth of the model to 5.
Output directory: Directory to save TensorBoard logs and trained model.
Adjusting Model Width¶
Customizing the width of the model, which defines the number of filters in the first layer.
HoverFast train dataset_name -p /path/to/dataset -w 6 -o /path/to/outdir
Explanation:
Dataset: Name of the dataset.
Dataset Path: Directory that contains the pytables.
Model Width: Set the width of the model to 6.
Output directory: Directory to save TensorBoard logs and trained model.
Custom Epochs¶
Setting a custom number of epochs for the training process.
HoverFast train dataset_name -p /path/to/dataset -e 200 -o /path/to/outdir
Explanation:
Dataset: Name of the dataset.
Dataset Path: Directory that contains the pytables.
Epochs: Set the number of epochs to 200.
Output directory: Directory to save TensorBoard logs and trained model.