Being recently interested in Kalman filters and Recurrent neural networks, it appears to me that the two are closely related, yet I can’t find relevant enough litterature :
In a Kalman filter, the set of equations is :
with x the state and z the measurement.
In an Elman RNN (from here), the relation between the layers is:
with x the input layer, h the hidden layer and y the output layer and σ are the activation functions for the layers.
It’s clear that the two set of equations are the same, modulo the activations. The analogy here seems to be the following. The output layer corresponds to the measured state, the hidden layer is the true state, driven by a process x which is the input layer.
First question : is the analogy viable ? And how can we interpret the activations ?
Second question : in a Kalman filter the A matrix is that of the underlying dynamics of the state x. Since training a RNN allows to learn the W matrices, are RNN able to learn the dynamics of the underlying state ? Ie once my RNN is trained, can I look at the coefficients of my network to guess the dynamics behind my data ?
(I’m going to try to do the experiment on artificially generated data, to see if this works, and will update as soon as it’s done)
EDIT : I wish I had access to this paper
Yes indeed they are related because both are used to predict yn and sn at time step n based on some current observation xn and state sn−1 i.e. they both represent a function F such that F(xn,sn−1)=(yn,sn)
The advantage of the RNN over Kalman filter is that the RNN architecture can be arbitrarily complex (number of layers and neurons) and its parameters are learnt, whereas the algorithm (including its parameters) of Kalman filter is fixed.
Recurrent Neural Networks are more general than Kalman filter. One could actually train a RNN to simulate a Kalman filter.
Neural nets are kind of black box models and weights and activations are very often not interpretable (above all in the deeper layers).
In the end neural nets are only optimized to make the best predictions and not to have “interpretable” parameters.
Nowadays if you work on time series, have enough data and want the best accuracy, RNN is the preferred approach.