Finding an Effective Strategy for AutoML Pipeline Optimization

07/29/2021, 1:00 PM1:30 PM UTC
Green

Abstract:

One of the main problems in AutoML implementation is finding the best strategy to search the most optimal pipeline in prediction or classification tasks. This problem is commonly known as CASH (Combined Algorithm Selection and Hyperparameter Optimization). This talk will show competitive results with significantly shorter computation time by just focusing the search in the model selection and structure of the pipeline without the need of hyperparameter optimization.

Description:

The CASH problem can be decomposed into three major components:

  • searching the optimal m model with n(m) search space
  • searching the optimal order of p preprocessing elements with n(p) search space
  • searching the optimal h hyperparameters with n(h) search space

The most popular approaches involve simultaneous search of these three components with time complexity of n(p) x n(m) x n(h). An alternative method is to perform the search sequentially starting with m using surrogates p and h followed by searching for p using optimal m and surrogate h, and finally searching for h using optimal p and m found. This alternative technique only involves n(p) + n(m) + n(h) search space which is significantly smaller than simultaneously searching p, m, and h. We find in our experiments using the AutoMLPipeline package, that in many cases, it is sufficient to just search for m and p to achieve competitive performance with those of other optimal algorithms that searches all three components simultaneously.

Relevant paper: https://arxiv.org/abs/2107.01253

Relevant Julia Packages used in the talk:

Platinum sponsors

Julia Computing

Gold sponsors

Relational AI

Silver sponsors

Invenia LabsConningPumas AIQuEra Computing Inc.King Abdullah University of Science and TechnologyDataChef.coJeffrey Sarnoff

Media partners

Packt Publication

Fiscal Sponsor

NumFOCUS