# Hyperparameter tuning in Gaussian Process Regression

I am trying to tune the hyperparameters of the gaussian process regression algorithm I’ve implemented. I simply want to maximize the log marginal likelihood given by the formula
where $K$ is the covariance matrix with the elements where $M=lI$ and $a,b$ and $l$ are hyperparameters.

partial derivative of the log marginal likelihood w.r.t parameters is given by the following

As the entries of $K$ depend on the parameters, so do derivatives and inverse of $K$. This means, when a gradient-based optimizer is employed, evaluating the gradient at a given point (parameter value) will require recomputation of the covariance matrix. In my application, this is not feasible because computing the covariance matrix from scratch and computing its inverse in every iteration of gradient ascent is too expensive. My question is what my options are to find a fairly good combination of these three parameters? and I also don’t know which parameter to optimize first and I would appreciate any pointers on this issue as well.