How to Configure Algorithms

List of supported algorithms for hyperparameter tuning

This page describes hyperparameter (HP) tuning algorithms that Katib supports and how to configure them.

HP Tuning Algorithms

Katib currently supports several search algorithms for NAS:

The algorithm name in Katib is grid.

Grid sampling is useful when all variables are discrete (as opposed to continuous) and the number of possibilities is low. A grid search performs an exhaustive combinatorial search over all possibilities, making the search process extremely long even for medium sized problems.

Katib uses the Optuna optimization framework for its grid search.

Setting nameDescriptionExample
random_state[int]: Set random_state to something other than None for reproducible results.10

The algorithm name in Katib is random.

Random sampling is an alternative to grid search and is used when the number of discrete variables to optimize is large and the time required for each evaluation is long. When all parameters are discrete, random search performs sampling without replacement. Random search is therefore the best algorithm to use when combinatorial exploration is not possible. If the number of continuous variables is high, you should use quasi random sampling instead.

Katib uses the Hyperopt, Goptuna or Optuna optimization framework for its random search.

Katib supports the following algorithm settings:

Setting nameDescriptionExample
random_state[int]: Set random_state to something other than None for reproducible results.10

Bayesian optimization

The algorithm name in Katib is bayesianoptimization.

The Bayesian optimization method uses Gaussian process regression to model the search space. This technique calculates an estimate of the loss function and the uncertainty of that estimate at every point in the search space. The method is suitable when the number of dimensions in the search space is low. Since the method models both the expected loss and the uncertainty, the search algorithm converges in a few steps, making it a good choice when the time to complete the evaluation of a parameter configuration is long.

Katib uses the Scikit-Optimize optimization framework for its Bayesian search. Scikit-Optimize is also known as skopt.

Katib supports the following algorithm settings:

Setting NameDescriptionExample
base_estimator[“GP”, “RF”, “ET”, “GBRT” or sklearn regressor, default=“GP”]: Should inherit from sklearn.base.RegressorMixin. The predict method should have an optional return_std argument, which returns std(Y | x) along with E[Y | x]. If base_estimator is one of [“GP”, “RF”, “ET”, “GBRT”], the system uses a default surrogate model of the corresponding type. Learn more information in the skopt documentation.GP
n_initial_points[int, default=10]: Number of evaluations of func with initialization points before approximating it with base_estimator. Points provided as x0 count as initialization points. If len(x0) < n_initial_points, the system samples additional points at random. Learn more information in the skopt documentation.10
acq_func[string, default="gp_hedge"]: The function to minimize over the posterior distribution. Learn more information in the skopt documentation.gp_hedge
acq_optimizer[string, “sampling” or “lbfgs”, default=“auto”]: The method to minimize the acquisition function. The system updates the fit model with the optimal value obtained by optimizing acq_func with acq_optimizer. Learn more information in the skopt documentation.auto
random_state[int]: Set random_state to something other than None for reproducible results.10

Hyperband

The algorithm name in Katib is hyperband.

Katib supports the Hyperband optimization framework. Instead of using Bayesian optimization to select configurations, Hyperband focuses on early stopping as a strategy for optimizing resource allocation and thus for maximizing the number of configurations that it can evaluate. Hyperband also focuses on the speed of the search.

Tree of Parzen Estimators (TPE)

The algorithm name in Katib is tpe.

Katib uses the Hyperopt, Goptuna or Optuna optimization framework for its TPE search.

This method provides a forward and reverse gradient-based search.

Katib supports the following algorithm settings:

Setting nameDescriptionExample
n_EI_candidates[int]: Number of candidate samples used to calculate the expected improvement.25
random_state[int]: Set random_state to something other than None for reproducible results.10
gamma[float]: The threshold to split between l(x) and g(x), check equation 2 in this Paper. Value must be in (0, 1) range.0.25
prior_weight[float]: Smoothing factor for counts, to avoid having 0 probability. Value must be > 0.1.1

Multivariate TPE

The algorithm name in Katib is multivariate-tpe.

Katib uses the Optuna optimization framework for its Multivariate TPE search.

Multivariate TPE is improved version of independent (default) TPE. This method finds dependencies among hyperparameters in search space.

Katib supports the following algorithm settings:

Setting nameDescriptionExample
n_ei_candidates[int]: Number of Trials used to calculate the expected improvement.25
random_state[int]: Set random_state to something other than None for reproducible results.10
n_startup_trials[int]: Number of initial Trials for which the random search algorithm generates hyperparameters.5

Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

The algorithm name in Katib is cmaes.

Katib uses the Goptuna or Optuna optimization framework for its CMA-ES search.

The Covariance Matrix Adaptation Evolution Strategy is a stochastic derivative-free numerical optimization algorithm for optimization problems in continuous search spaces. You can also use IPOP-CMA-ES and BIPOP-CMA-ES, variant algorithms for restarting optimization when converges to local minimum.

Katib supports the following algorithm settings:

Setting nameDescriptionExample
random_state[int]: Set random_state to something other than None for reproducible results.10
sigma[float]: Initial standard deviation of CMA-ES.0.001
restart_strategy[string, "none", "ipop", or "bipop", default="none"]: Strategy for restarting CMA-ES optimization when converges to a local minimum."ipop"

Sobol Quasirandom Sequence

The algorithm name in Katib is sobol.

Katib uses the Goptuna optimization framework for its Sobol’s quasirandom search.

The Sobol’s quasirandom sequence is a low-discrepancy sequence. And it is known that Sobol’s quasirandom sequence can provide better uniformity properties.

Population Based Training

The algorithm name in Katib is pbt.

Review the population based training paper for more details about the algorithm.

The PBT service requires a Persistent Volume Claim with RWX access mode to share resources between Suggestion and Trials. Currently, Katib Experiments should have resumePolicy: FromVolume to run the PBT algorithm. Learn more about resume policies in this guide.

Katib supports the following algorithm settings:

Setting nameDescriptionExample
suggestion_trial_dirThe location within the Trial container where checkpoints are saved/var/log/katib/checkpoints/
n_populationNumber of Trial seeds per generation40
resample_probabilitynull (default): perturbs the hyperparameter by 0.8 or 1.2. 0-1: resamples the original distribution by the specified probability0.3
truncation_thresholdExploit threshold for pruning low performing seeds0.4

Use Custom Algorithm in Katib

You can add an HP tuning algorithm to Katib yourself. The design of Katib follows the ask-and-tell pattern:

  1. Ask for a new set of parameters
  2. Walk to the Experiment and program in the new parameters
  3. Observe the outcome of running the Experiment
  4. Walk back to your laptop and tell the optimizer about the outcome 1. go to step 1

When an Experiment is created, one algorithm service as Kubernetes Deployment will be created. Then Katib asks for new sets of parameters via GetSuggestions gRPC call. After that, Katib creates new Trials according to the sets and observe the outcome. When the Trials are finished, Katib tells the metrics of the finished Trials to the algorithm, and ask another new sets.

Create a new Algorithm Service

The new algorithm needs to implement Suggestion service defined in api.proto.

One sample algorithm looks like:

from pkg.apis.manager.v1beta1.python import api_pb2
from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.suggestion.v1beta1.internal.search_space import HyperParameter, HyperParameterSearchSpace
from pkg.suggestion.v1beta1.internal.trial import Trial, Assignment
from pkg.suggestion.v1beta1.hyperopt.base_service import BaseHyperoptService
from pkg.suggestion.v1beta1.internal.base_health_service import HealthServicer


# Inherit SuggestionServicer and implement GetSuggestions.
class HyperoptService(
        api_pb2_grpc.SuggestionServicer, HealthServicer):
    def ValidateAlgorithmSettings(self, request, context):
        # Optional, it is used to validate algorithm settings defined by users.
        pass
    def GetSuggestions(self, request, context):
        # Convert the Experiment in GRPC request to the search space.
        # search_space example:
        #   HyperParameterSearchSpace(
        #       goal: MAXIMIZE,
        #       params: [HyperParameter(name: param-1, type: INTEGER, min: 1, max: 5, step: 0),
        #                HyperParameter(name: param-2, type: CATEGORICAL, list: cat1, cat2, cat3),
        #                HyperParameter(name: param-3, type: DISCRETE, list: 3, 2, 6),
        #                HyperParameter(name: param-4, type: DOUBLE, min: 1, max: 5, step: )]
        #   )
        search_space = HyperParameterSearchSpace.convert(request.experiment)
        # Convert the Trials in GRPC request to the Trials in algorithm side.
        # Trials example:
        #   [Trial(
        #       assignment: [Assignment(name=param-1, value=2),
        #                    Assignment(name=param-2, value=cat1),
        #                    Assignment(name=param-3, value=2),
        #                    Assignment(name=param-4, value=3.44)],
        #       target_metric: Metric(name="metric-2" value="5643"),
        #       additional_metrics: [Metric(name=metric-1, value=435),
        #                            Metric(name=metric-3, value=5643)],
        #   Trial(
        #       assignment: [Assignment(name=param-1, value=3),
        #                    Assignment(name=param-2, value=cat2),
        #                    Assignment(name=param-3, value=6),
        #                    Assignment(name=param-4, value=4.44)],
        #       target_metric: Metric(name="metric-2" value="3242"),
        #       additional_metrics: [Metric(name=metric=1, value=123),
        #                            Metric(name=metric-3, value=543)],
        trials = Trial.convert(request.trials)
        #--------------------------------------------------------------
        # Your code here
        # Implement the logic to generate new assignments for the given current request number.
        # For example, if request.current_request_number is 2, you should return:
        # [
        #   [Assignment(name=param-1, value=3),
        #    Assignment(name=param-2, value=cat2),
        #    Assignment(name=param-3, value=3),
        #    Assignment(name=param-4, value=3.22)
        #   ],
        #   [Assignment(name=param-1, value=4),
        #    Assignment(name=param-2, value=cat4),
        #    Assignment(name=param-3, value=2),
        #    Assignment(name=param-4, value=4.32)
        #   ],
        # ]
        list_of_assignments = your_logic(search_space, trials, request.current_request_number)
        #--------------------------------------------------------------
        # Convert list_of_assignments to
        return api_pb2.GetSuggestionsReply(
            trials=Assignment.generate(list_of_assignments)
        )

Build Docker Image for Algorithm Service

You should build Docker image for your Algorithm service, for that add a new Docker image under cmd/suggestion, for example: cmd/suggestion/hyperopt. The new gRPC server should serve in port 6789.

After that you can build Docker image from your algorithm:

docker build . -f cmd/suggestion/<PATH_TO_DOCKER> -t <DOCKER_IMAGE>

Update the Katib Config with

Update the Katib config with the new algorithm entity:

  runtime:
    suggestions:
      - algorithmName: random
        image: docker.io/kubeflowkatib/suggestion-hyperopt:$(KATIB_VERSION)
      - algorithmName: tpe
        image: docker.io/kubeflowkatib/suggestion-hyperopt:$(KATIB_VERSION)
+     - algorithmName: <new-algorithm-name>
+       image: <DOCKER_IMAGE>

Contribute the Algorithm to Katib

If you want to contribute the algorithm to Katib, you could add unit test and/or e2e test for it in the CI and submit a PR.

Add Unit Tests for the Algorithm

Here is an example test_hyperopt_service.py:

import grpc
import grpc_testing
import unittest

from pkg.apis.manager.v1beta1.python import api_pb2_grpc
from pkg.apis.manager.v1beta1.python import api_pb2

from pkg.suggestion.v1beta1.hyperopt.service import HyperoptService

class TestHyperopt(unittest.TestCase):
    def setUp(self):
        servicers = {
            api_pb2.DESCRIPTOR.services_by_name['Suggestion']: HyperoptService()
        }

        self.test_server = grpc_testing.server_from_dictionary(
            servicers, grpc_testing.strict_real_time())


if __name__ == '__main__':
    unittest.main()

You can setup the gRPC server using grpc_testing, then define your own test cases.

Feedback

Was this page helpful?


Last modified April 26, 2024: Fix links in other pages (7333160)