As we delve into Neural Networks literature, we get to identify other methods with neuromorphic topologies (“Neural-Network”-like architectures). And I’m not talking about the Universal Approximation Theorem. Examples are given below.
Then, it makes me wonder: what is the definition of an artificial Neural Network? Its topology appears to cover everything.
One of the first identifications we make is between PCA and a linear Autoencoder with tied-weights in the encoder and decoder and thresholded activations in the bottleneck layer.
Also, a common identification is done between linear models (logistic regression in special) and a Neural Network with no hidden layer and a single output layer. This identification opens several doors.
And so, just as easily, we can incorporate arbitrary regularized versions with specialized loss functions of these algorithms into a Neural Network framework.
But the more we dig, the more similarities appear. I just stumbled into Deep Neural Decision Trees, which makes the identification of a specific ANN architecture with decision trees, allowing these to be learned by ANN methods (such as Gradient Descent backpropagation). From this we can construct Random Forests and Gradient Boosted Decision Trees from solely Neural Network topologies.
If everything can be expressed as an Artificial Neural Network, what defines an Artificial Neural Network?
Jürgen Schmidhuber, “Deep Learning in Neural Networks: An Overview” traces the history of key concepts in neural networks and deep learning. In his view, neural networks would appear to encompass essentially any model which can be characterized as a directed graph where each node represents some computational unit. Schmidhuber is a prominent neural networks researcher, and wrote the original paper on LSTM networks with Sepp Hochreiter.
Which modifiable components of a learning system are responsible for its success or failure? What changes to them improve performance? This has been called the fundamental credit assignment problem (Minsky, 1963). There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses (Sec. 6.8). The present survey, however, will focus on the narrower, but now commercially important, subfield of Deep Learning (DL) in Artificial Neural Networks (NNs).
A standard neural network (NN) consists of many simple, connected processors called neurons, each producing a sequence of real-valued activations. Input neurons get activated through sensors perceiving the environment, other neurons get activated through weighted connections from previously active neurons (details in Sec. 2). Some neurons may influence the environment by triggering actions. Learning or credit assignment is about finding weights that make the NN exhibit desired behavior, such as driving a car. Depending on the problem and how the neurons are connected, such behavior may require long causal chains of computational stages (Sec. 3), where each stage transforms (often in a non-linear way) the aggregate activation of the network. Deep Learning is about accurately assigning credit across many such stages.
Shallow NN-like models with few such stages have been around for many decades if not centuries (Sec. 5.1). Models with several successive nonlinear layers of neurons date back at least to the 1960s (Sec. 5.3) and 1970s (Sec. 5.5). An efficient gradient descent method for teacher-based Supervised Learning (SL) in discrete, differentiable networks of arbitrary depth called backpropagation (BP) was developed in the 1960s and 1970s, and applied to NNs in 1981 (Sec. 5.5). BP-based training of deep NNs with many layers, however, had been found to be difficult in practice by the late 1980s (Sec. 5.6), and had become an explicit research subject by the early 1990s (Sec. 5.9). DL became practically feasible to some extent through the help of Unsupervised Learning (UL), e.g., Sec. 5.10 (1991), Sec. 5.15 (2006). The 1990s and 2000s also saw many improvements of purely supervised DL (Sec. 5). In the new millennium, deep NNs have finally attracted wide-spread attention, mainly by outperforming alternative machine learning methods such as kernel machines (Vapnik, 1995; Scholkopf et al., 1998) in numerous important applications. In fact, since 2009, supervised deep NNs have won many official international pattern recognition competitions (e.g., Sec. 5.17, 5.19, 5.21, 5.22), achieving the first superhuman visual pattern recognition results in limited domains (Sec. 5.19, 2011). Deep NNs also have become relevant for the more general field of Reinforcement Learning (RL) where there is no supervising teacher (Sec. 6).
On the other hand, I’m not sure that it’s necessarily profitable to try and construct a taxonomy of mutually-exclusive buckets for machine learning strategies. I think we can say that there are perspectives from which models can be viewed as neural networks. I don’t think that perspective is necessarily the best or useful in all contexts. For example, I’m still planning to refer to random forests and gradient boosted trees as “tree ensembles” instead of abstracting away their distinctions and calling them “neural network trees”. Moreover, Schmidhuber distinguishes NNs from kernel machines — even though kernel machines have some connections to NNs — when he writes “In the new millennium, deep NNs have finally attracted wide-spread attention, mainly by outperforming alternative machine learning methods such as kernel machines … in numerous important applications. “