How to Configure Early Stopping

Early Stopping overview for Katib Experiments

This guide shows how you can use early stopping to optimize cost for your Katib Experiments. Early stopping allows you to avoid overfitting when you train your model during Katib Experiments. It also helps by saving computing resources and reducing Experiment execution time by stopping the Experiment’s Trials when the target metric(s) no longer improves before the training process is complete.

The major advantage of using early stopping in Katib is that you don’t need to modify your training container package. All you have to do is make necessary changes to your Experiment’s YAML file.

Early stopping works in the same way as Katib’s metrics collector. It analyses required metrics from the StdOut or from the arbitrary output file and an early stopping algorithm makes the decision if the Trial needs to be stopped. Currently, early stopping works only with StdOut or File metrics collectors.

Note: Your training container must print training logs with the timestamp, because early stopping algorithms need to know the sequence of reported metrics. Check the PyTorch example to learn how to add a date format to your logs.

Configure the Experiment with early stopping

As a reference, you can use the YAML file of the early stopping example.

  1. Follow the guide to configure your Katib Experiment.

  2. Next, to apply early stopping for your Experiment, specify the .spec.earlyStopping parameter, similar to the .spec.algorithm.

    • .earlyStopping.algorithmName - the name of the early stopping algorithm.

    • .earlyStopping.algorithmSettings- the settings for the early stopping algorithm.

What happens is your Experiment’s Suggestion produces new Trials. After that, the early stopping algorithm generates early stopping rules for the created Trials. Once the Trial reaches all the rules, it is stopped and the Trial status is changed to the EarlyStopped. Then, Katib calls the Suggestion again to ask for the new Trials.

Early Stopping Algorithms

Katib currently supports several algorithms for early stopping:

More algorithms are under development.

Median Stopping Rule

The early stopping algorithm name in Katib is medianstop.

The median stopping rule stops a pending Trial X at step S if the Trial’s best objective value by step S is worse than the median value of the running averages of all completed Trials objectives reported up to step S.

To learn more about it, check Google Vizier: A Service for Black-Box Optimization.

Katib supports the following early stopping settings:

Setting NameDescriptionDefault Value
min_trials_requiredMinimal number of successful Trials to compute median value3
start_stepNumber of reported intermediate results before stopping the Trial4

Next steps

Feedback

Was this page helpful?


Last modified April 26, 2024: Fix links in other pages (7333160)