Train Module

This module contains functions to train a model from pytable files of segmented nuclei. Before being able to train, you might need to generate a dataset from HoverNet. We provide a docker container to do this since local HoverNet installation can be tricky.

Dataset Generation

Structure Your Data Directory

Structure your data directory as follows:

└── dir
    config.ini
    └── slides/
    ├── slide_1.svs
    ├── ...
    └── slide_n.svs

Generate Dataset

To generate the dataset, run the following command:

docker run --gpus all -it -v /path/to/dir/:/HoverFastData petroslk/data_generation_hovernet:latest hoverfast_data_generation -c '/HoverFastData/config.ini'

This should generate two files in the directory called data_train.pytable and data_test.pytable. You can use these to train the model.

Config File for Data Generation

Here is an example of what the config.ini file should look like:

[General]
dataset = data
classes = 0, 1
gpu_id = 0
seed = 3981674709

[Dataset_train_test]
test_set_size = 0.2
num_tiles = 10
tile_size = 1024
level = 0
mask_level = 3
mask_th = 0.2
save_tiles = False
rois = False

Explanation:

  • [General]:

    • dataset: Name of the dataset.

    • classes: Classes to segment.

    • gpu_id: GPU ID to use for data generation.

    • seed: Random seed for reproducibility.

  • [Dataset_train_test]:

    • test_set_size: Proportion of the dataset to use for testing.

    • num_tiles: Number of tiles to select from each WSI.

    • tile_size: Size of each tile.

    • level: Level of the WSI to use.

    • mask_level: Level of the mask to use.

    • mask_th: Threshold for the mask.

    • save_tiles: Whether to save the extracted tiles.

    • rois: Whether to use ROIs.

Train Parser

The train parser allows you to configure various parameters for training the model. Below are the details of each argument:

  • dataname (positional argument): Dataset name, corresponds to the pytables name under the following format: (dataname)_(phase).pytables.

  • -o, –outdir: Output directory path for tensorboard and trained model. Default is ./output/.

  • -p, –dataset_path: Path to the directory that contains the pytables. Default is ./.

  • -b, –batch_size: Number of workers for the dataloader. Default is 5.

  • -n, –n_worker: Number of workers for the dataloader. Default is min(batch_size, os.cpu_count()).

  • -e, –epoch: Number of epochs. Default is 100.

  • -d, –depth: Depth of the model. Default is 3.

  • -w, –width: Width of the model. Defines the number of filters in the first layer (2**w) with an exponential growth rate respective to the depth of the model. Default is 4.

Usage Examples

Basic Usage

This example demonstrates the basic usage of the train command with minimal arguments, using default settings.

HoverFast train dataset_name -p /path/to/dataset -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset, which corresponds to the pytables name.

  • Dataset Path: Directory that contains the pytables.

  • Output directory: Directory to save TensorBoard logs and trained model.

Custom Batch Size

In this example, we specify a custom batch size for the dataloader.

HoverFast train dataset_name -p /path/to/dataset -b 10 -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset.

  • Dataset Path: Directory that contains the pytables.

  • Batch Size: Set to 10 workers for the dataloader.

  • Output directory: Directory to save TensorBoard logs and trained model.

Custom Number of Workers

Setting a custom number of workers for the dataloader to optimize data loading.

HoverFast train dataset_name -p /path/to/dataset -n 8 -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset.

  • Dataset Path: Directory that contains the pytables.

  • Number of Workers: Set to 8 workers for the dataloader.

  • Output directory: Directory to save TensorBoard logs and trained model.

Adjusting Model Depth

Adjusting the depth of the model for more complex training scenarios.

HoverFast train dataset_name -p /path/to/dataset -d 5 -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset.

  • Dataset Path: Directory that contains the pytables.

  • Model Depth: Set the depth of the model to 5.

  • Output directory: Directory to save TensorBoard logs and trained model.

Adjusting Model Width

Customizing the width of the model, which defines the number of filters in the first layer.

HoverFast train dataset_name -p /path/to/dataset -w 6 -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset.

  • Dataset Path: Directory that contains the pytables.

  • Model Width: Set the width of the model to 6.

  • Output directory: Directory to save TensorBoard logs and trained model.

Custom Epochs

Setting a custom number of epochs for the training process.

HoverFast train dataset_name -p /path/to/dataset -e 200 -o /path/to/outdir

Explanation:

  • Dataset: Name of the dataset.

  • Dataset Path: Directory that contains the pytables.

  • Epochs: Set the number of epochs to 200.

  • Output directory: Directory to save TensorBoard logs and trained model.