One of the main problems in AutoML implementation is finding the best strategy to search the most optimal pipeline in prediction or classification tasks. This problem is commonly known as CASH (Combined Algorithm Selection and Hyperparameter Optimization). This talk will show competitive results with significantly shorter computation time by just focusing the search in the model selection and structure of the pipeline without the need of hyperparameter optimization.
The CASH problem can be decomposed into three major components:
The most popular approaches involve simultaneous search of these three components with time complexity of n(p) x n(m) x n(h). An alternative method is to perform the search sequentially starting with m using surrogates p and h followed by searching for p using optimal m and surrogate h, and finally searching for h using optimal p and m found. This alternative technique only involves n(p) + n(m) + n(h) search space which is significantly smaller than simultaneously searching p, m, and h. We find in our experiments using the AutoMLPipeline package, that in many cases, it is sufficient to just search for m and p to achieve competitive performance with those of other optimal algorithms that searches all three components simultaneously.
Relevant paper: https://arxiv.org/abs/2107.01253
Relevant Julia Packages used in the talk: