A limitations of standard neural net algorithms (like backprop) is that you have to make a design decision of how many hidden layers and neurons-per-layer you want. Usually, the learning rate and generalization is highly sensitive to these choices. This has been the reason, why neural net algorithms like cascade correlation have been generating interest. It starts with a minimal topology (just input and output unit) and recruit new hidden units as learning progresses.
The CC-NN algorithm was introduced by Fahlman in 1990, and the recurrent version in 1991. What are some more recent (post 1992) neural net algorithms that start with a minimal topology?
The implicit question here is how can you determine the topology/structure of a neural network or machine learning model so that the model is “of the right size” and not overfitting/underfitting.
Since cascade correlation back in 1990, there has been a whole host of methods for doing this now, many of them with much better statistical or computational properties:
- boosting: train a weak learner at a time, with each weak learner given a reweighted training set so that it learns things that past learners haven’t learnt.
- sparsity inducing regularization like lasso or automatic relevance determination: start with a large model/network, and use a regularizer that encourages the unneeded units to get “turned off”, leaving those that are useful active.
- Bayesian nonparametrics: forget trying to find the “right” model size. Just use one big model, and be careful with regularizing/being Bayesian, so you don’t overfit. For example, a neural network with an infinite number of units and Gaussian priors can be derived to be a Gaussian process, which turns out to be much simpler to train.
- Deep learning: as noted in another answer, train a deep network one layer at a time. This doesn’t actually solve the problem of determining the number of units per layer – often this is still set by hand or cross-validation.