# Extreme learning machine: what’s it all about?

I’ve been thinking about, implementing and using the Extreme Learning Machine (ELM) paradigm for more than a year now, and the longer I do, the more I doubt that it is really a good thing. My opinion, however, seems to be in contrast with scientific community where — when using citations and new publications as a measure — it seems to be a hot topic.

The ELM has been introduced by Huang et. al. around 2003. The underlying idea is rather simple: start with a 2-layer artificial neural network and randomly assign the coefficients in the first layer. This, one transforms the non-linear optimization problem which is usually handled via backpropagation into a simple linear regression problem. More detailed, for $\mathbf x \in \mathbb R^D$, the model is

Now, only the $w_i$ are adjusted (in order to minimize squared-error-loss), whereas the $v_{ik}$‘s are all chosen randomly. As a compensation for the loss in degrees-of-freedom, the usual suggestion is to use a rather large number of hidden nodes (i.e. free parameters $w_i$).

From another perspective (not the one usually promoted in the literature, which comes from the neural network side), the whole procedure is simply linear regression, but one where you choose your basis functions $\phi$ randomly, for example

(Many other choices beside the sigmoid are possible for the random functions. For instance, the same principle has also been applied using radial basis functions.)

From this viewpoint, the whole method becomes almost too simplistic, and this is also the point where I start to doubt that the method is really a good one (… whereas its scientific marketing certainly is). So, here are my questions:

• The idea to raster the input space using random basis functions is, in my opinion, good for low dimensions. In high dimensions, I think it is just not possible to find a good choice using random selection with a reasonable number of basisfunctions. Therefore, does the ELM degrade in high-dimensions (due to the curse of dimensionality)?

• Do you know of experimental results supporting/contradicting this opinion? In the linked paper there is only one 27-dimensional regression data set (PYRIM) where the method performs similar to SVMs (whereas I would rather like to see a comparison to a backpropagation ANN)