I have come across some basic ways to measure the complexity of neural networks:

- Naive and informal: count the number of neurons, hidden neurons, layers, or hidden layers
- VC-dimension (Eduardo D. Sontag [1998] “VC dimension of neural networks” [pdf].)
- A course grained and asymptotic computational complexity measure by equivalence to TC^0_d.
Are there other alternatives?

It is preferred:

- If the complexity metric could be used to measure neural networks from different paradigms (to measure backprop, dynamics neural nets, cascade correlation, etc) on the same scale. For instance, VC-dimension can be used for different types on networks (or even things other than neural networks) while number of neurons is only useful between very specific models where the activation function, signals (basic sums vs. spikes), and other properties of the network are the same.
- If it has nice correspondences to standard measures of complexity of functions learnable by the network
- If it is easily to compute the metric on specific networks (this last one is not a must, though.)
## Notes

This question is based on a more general question on CogSci.SE.

**Answer**

You might want to have a look at the paper “(Not) Bounding the True Error by John Langford & Rich Caruana (NIPS, 2001)

The abstract states:

We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs a distribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound. In this paper we demonstrate the method on artificial neural networks with results of a 2 3 order of magnitude improvement vs. the best deterministic neural net bounds.

They show that you can apply PAC-Bayes style bounds to stochastic neural networks. However the analysis only applies to 2-layer feed-forward neural networks with a sigmoidal transfer function. In this case the complexity term only depends on the number of nodes and the variance of the weights. They show that for this setting the bound effectively predicts when over-training will occur. Unfortunately it doesn’t really hit any of your “preferred” properties though!

**Attribution***Source : Link , Question Author : Artem Kaznatcheev , Answer Author : tdc*