Train Module
============

This module contains functions to train a model from pytable files of segmented nuclei. Before being able to train, you might need to generate a dataset from HoverNet.
We provide a docker container to do this since local HoverNet installation can be tricky.

Dataset Generation
------------------

Structure Your Data Directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Structure your data directory as follows:

.. code-block:: none

    └── dir
        config.ini
        └── slides/
        ├── slide_1.svs
        ├── ...
        └── slide_n.svs


Generate Dataset
^^^^^^^^^^^^^^^^^

To generate the dataset, run the following command:

.. code-block:: sh

    docker run --gpus all -it -v /path/to/dir/:/HoverFastData petroslk/data_generation_hovernet:latest hoverfast_data_generation -c '/HoverFastData/config.ini'

This should generate two files in the directory called `data_train.pytable` and `data_test.pytable`. You can use these to train the model.

Config File for Data Generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here is an example of what the `config.ini` file should look like:

.. code-block:: ini

    [General]
    dataset = data
    classes = 0, 1
    gpu_id = 0
    seed = 3981674709

    [Dataset_train_test]
    test_set_size = 0.2
    num_tiles = 10
    tile_size = 1024
    level = 0
    mask_level = 3
    mask_th = 0.2
    save_tiles = False
    rois = False

Explanation:

- **[General]**:

  - **dataset**: Name of the dataset.
  - **classes**: Classes to segment.
  - **gpu_id**: GPU ID to use for data generation.
  - **seed**: Random seed for reproducibility.

- **[Dataset_train_test]**:

  - **test_set_size**: Proportion of the dataset to use for testing.
  - **num_tiles**: Number of tiles to select from each WSI.
  - **tile_size**: Size of each tile.
  - **level**: Level of the WSI to use.
  - **mask_level**: Level of the mask to use.
  - **mask_th**: Threshold for the mask.
  - **save_tiles**: Whether to save the extracted tiles.
  - **rois**: Whether to use ROIs.


Train Parser
------------

The `train` parser allows you to configure various parameters for training the model. Below are the details of each argument:

- **dataname** (positional argument): Dataset name, corresponds to the pytables name under the following format: `(dataname)_(phase).pytables`.
- **-o, --outdir**: Output directory path for tensorboard and trained model. Default is `./output/`.
- **-p, --dataset_path**: Path to the directory that contains the pytables. Default is `./`.
- **-b, --batch_size**: Number of workers for the dataloader. Default is `5`.
- **-n, --n_worker**: Number of workers for the dataloader. Default is `min(batch_size, os.cpu_count())`.
- **-e, --epoch**: Number of epochs. Default is `100`.
- **-d, --depth**: Depth of the model. Default is `3`.
- **-w, --width**: Width of the model. Defines the number of filters in the first layer (`2**w`) with an exponential growth rate respective to the depth of the model. Default is `4`.

Usage Examples
--------------

Basic Usage
^^^^^^^^^^^

This example demonstrates the basic usage of the `train` command with minimal arguments, using default settings.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset, which corresponds to the pytables name.
- Dataset Path: Directory that contains the pytables.
- Output directory: Directory to save TensorBoard logs and trained model.

Custom Batch Size
^^^^^^^^^^^^^^^^^

In this example, we specify a custom batch size for the dataloader.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -b 10 -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset.
- Dataset Path: Directory that contains the pytables.
- Batch Size: Set to 10 workers for the dataloader.
- Output directory: Directory to save TensorBoard logs and trained model.

Custom Number of Workers
^^^^^^^^^^^^^^^^^^^^^^^^

Setting a custom number of workers for the dataloader to optimize data loading.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -n 8 -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset.
- Dataset Path: Directory that contains the pytables.
- Number of Workers: Set to 8 workers for the dataloader.
- Output directory: Directory to save TensorBoard logs and trained model.

Adjusting Model Depth
^^^^^^^^^^^^^^^^^^^^^

Adjusting the depth of the model for more complex training scenarios.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -d 5 -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset.
- Dataset Path: Directory that contains the pytables.
- Model Depth: Set the depth of the model to 5.
- Output directory: Directory to save TensorBoard logs and trained model.

Adjusting Model Width
^^^^^^^^^^^^^^^^^^^^^

Customizing the width of the model, which defines the number of filters in the first layer.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -w 6 -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset.
- Dataset Path: Directory that contains the pytables.
- Model Width: Set the width of the model to 6.
- Output directory: Directory to save TensorBoard logs and trained model.

Custom Epochs
^^^^^^^^^^^^^

Setting a custom number of epochs for the training process.

.. code-block:: sh

    HoverFast train dataset_name -p /path/to/dataset -e 200 -o /path/to/outdir

Explanation:

- Dataset: Name of the dataset.
- Dataset Path: Directory that contains the pytables.
- Epochs: Set the number of epochs to 200.
- Output directory: Directory to save TensorBoard logs and trained model.