Hyper parameters tuning: Random search vs Bayesian optimization

So, we know that random search works better than grid search, but a more recent approach is Bayesian optimization (using gaussian processes). I’ve looked up a comparison between the two, and found nothing. I know that at Stanford’s cs231n they mention only random search, but it is possible that they wanted to keep things simple.

My question is: which approach is generally better, and if the answer is “sometimes random search, sometimes Bayesian” when should I prefer one method over another?

Answer

I think that the answer here is the same as everywhere in data science: it depends on the data 🙂

It might happen that one method outperforms another (here https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning/ people compare Bayesian hyperparameter optimization and achieve a better result on the San Francisco crime kaggle challenge than with random search), however I doubt that there is a general rule for that. You can see a nice gif here (http://blog.revolutionanalytics.com/2016/06/bayesian-optimization-of-machine-learning-models.html) where people show the ‘path’ that Bayesian optimization takes in the landscape of hyperparameters, in particular, it does not seem as if it outperforms random search in general…

I think the reason why people tend to use Bayesian hyperparameter optimization is that it just takes less training steps in order to achieve a comparable result as compared to random search with a sufficiently high number of experiments.

Summarising in one sentence:

*When training time is critical, use Bayesian hyperparameter optimization and if time is not an issue, select one of both… *

Usually I am too lazy to implement the Bayesian stuff with Gaussian Processes if I can achieve the same result with random search… I just train Gradient Bossting ensembles on ‘few’ data, so for me, time is not an issue…

Attribution
Source : Link , Question Author : Yoni Keren , Answer Author : abunickabhi

Leave a Comment