I was reading this book Pattern Recognition and Machine Learning by Bishop. I had a confusion related to a derivation of the linear dynamical system. In LDS we assume the latent variables to be continuous. If Z denotes the latent variables and X denotes the observed variables

$p(z_n|z_{n-1}) = N(z_n|Az_{n-1},\tau)$

$p(x_n|z_n) = N(x_n,Cz_n,\Sigma)$

$p(z_1) = N(z_1|u_0,V_0)$

In LDS also alpha beta forward backward message passing is used to calculated the posterior latent distribution i.e $p(z_n|X)$

$\alpha(z_n)=p(x1…xn,z_n)$

$\hat\alpha(z_n) = \alpha(z_n)/P(x1….xn)$

My first question is in the book it is given as

$\hat\alpha(z_n) = N(z_n|u_n,V_n)$

How come we got the above. I mean $\hat\alpha(z_n)$ = $N(z_n|u_n,V_n))$. I mean how did we get this?

My next question is related to the derivation as you can follow along the screenshots of the pages of the book attached. I didn’t get what where $K_n$ came from and what Kalman filter gain is

$u_n = Au_{n-1} + K_n(x_n – CAu_{n-1})$

$V_n = I – K_nC)P_(n-1)$

$c_n = N(x_n|CAu_{n-1},CP_{n-1}C^T + \Sigma$

$K_n$ is the Kalman gain matrix $P_{n-1}C^T(CP_{n-1}C^T + \Sigma) ^ {-1}$

How did we derive the above equations, I mean how come

$u_n = Au_{n-1} + K_n(x_n – CAu_{n-1})$

I am just confused how the above derivation is made.

**Answer**

There is a nice derivation, several actually, in the following:

http://amzn.com/0470173661

This is a good book on the subject as well:

http://amzn.com/0471708585

The complete derivation, and simplifications that result in the textbook shortened form you present, is not short/clean so it is often omitted or left as an exercise for the reader.

You can think of Kalman gain as a mixture proportion that makes a weighted sum of an analytic/symbolic model and some noisy real-world measurement. If you have crappy measurements, but a good model then a properly set Kalman gain should favor the model. If you have a junk model, but pretty good measurements then your Kalman gain should favor the measurements. If you don’t have a good handle on what your uncertainties are, then it can be hard to properly setup your Kalman filter.

If you set the inputs properly, then it is an optimal estimator. There are a number of assumptions that go into its derivation and if any one of them isn’t true then it becomes a pretty good suboptimal estimator. For example, a Lag plot will demonstrate that the one-step Markov assumption implicit in the Kalman filter is not true for a cosine function. A Taylor series is an approximation, but it is not exact. You can make an extended Kalman filter based on the Taylor series but it is approximate, not exact. If you can take in information from two previous states instead of one, you can use a Block Kalman filter and regain your optimality. Bottom line, it is not a bad tool, but it is not “the silver bullet” and your mileage will vary. Make sure that you characterize it well before using it in the real world.

**Attribution***Source : Link , Question Author : user34790 , Answer Author : EngrStudent*