# Bootstrap filter/ Particle filter algorithm(Understanding)

I truly have a lack of understanding of how the bootstrap filter works. I roughly know the concepts but I fail to grasps certain details. This question is for me to clear up the clutter. Here I will use this popular filter algorithm from a reference by doucet (so far I think this is the easiest reference). Let me first tell you that my problem is understanding which distributions are known and which is unknown.

These are my questions:

1. In 2), what is the distribution $p(x_t|x^{(i)}_{t-1})$? Is this distribution known? Do we know this distribution for all $t$? If so, but what if we can’t sample from it? It’s funny they call this importance sampling step but I see no proposal distribution.
2. Also in 2) is $p(y_t|\tilde{x}^{(i)}_{t})$ a known distribution? “Normalise Importance Weights means to $w^{(i)}_{t}=\frac{\tilde{w}^{(i)}_{t}}{\sum_{i=1}^{N}\tilde{w}^{(i)}_{t}}$? What does the tilde on $x$ and $w$ mean? Does it mean something like un-resampled or un-normalised respectively?
3. I would appreciate if anyone could give a simple toy example using well-known distributions to use this bootstrap filter. The end goal of the bootstrap filter is not clear to me.

1. That is the transition density of the state ($x_t$), which is part of your model and therefore known. You do need to sample from it in the basic algorithm, but approximations are possible. $p(x_t|x_{t-1})$ is the proposal distribution in this case. It is used because the distribution $p(x_t|x_{0:t-1},y_{1:t})$ is generally not tractable.

2. Yes, that’s the observation density, which is also part of the model, and therefore known. Yes, that’s what normalization means. The tilde is used to signify something like “preliminary”: $\tilde{x}$ is $x$ before resampling, and $\tilde{w}$ is $w$ before renormalization. I would guess that it is done this way so that the notation matches up between variants of the algorithm that don’t have a resampling step (i.e. $x$ is always the final estimate).

3. The end goal of the bootstrap filter is to estimate the sequence of conditional distributions $p(x_t|y_{1:t})$ (the unobservable state at $t$, given all observations until $t$).

Consider the simple model:

$$X_t = X_{t-1} + \eta_t, \quad \eta_t \sim N(0,1)$$
$$X_0 \sim N(0,1)$$
$$Y_t = X_t + \varepsilon_t, \quad \varepsilon_t \sim N(0,1)$$

This is a random walk observed with noise (you only observe $Y$, not $X$). You can compute $p(X_t|Y_1, …, Y_t)$ exactly with the Kalman filter, but we’ll use the bootstrap filter at your request. We can restate the model in terms of the state transition distribution, the initial state distribution, and the observation distribution (in that order), which is more useful for the particle filter:

$$X_t | X_{t-1} \sim N(X_{t-1},1)$$
$$X_0 \sim N(0,1)$$
$$Y_t | X_t \sim N(X_t,1)$$

Applying the algorithm:

1. Initialization. We generate $N$ particles (independently) according to $X_0^{(i)} \sim N(0,1)$.

2. We simulate each particle forward independently by generating $X_1^{(i)} | X_0^{(i)} \sim N(X_0^{(i)},1)$, for each $N$.

We then compute the likelihood $\tilde{w}_t^{(i)} = \phi(y_t; x_t^{(i)},1)$, where $\phi(x; \mu, \sigma^2)$ is the normal density with mean $\mu$ and variance $\sigma^2$ (our observation density). We want to give more weight to particles which are more likely to produce the observation $y_t$ that we recorded. We normalize these weights so they sum to 1.

3. We resample the particles according to these weights $w_t$. Note that a particle is a full path of $x$ (i.e. don’t just resample the last point, it’s the whole thing, which they denote as $x_{0:t}^{(i)}$).

Go back to step 2, moving forward with the resampled version of the particles, until we’ve processed the whole series.

An implementation in R follows:

# Simulate some fake data
set.seed(123)

tau <- 100
x <- cumsum(rnorm(tau))
y <- x + rnorm(tau)

# Begin particle filter
N <- 1000
x.pf <- matrix(rep(NA,(tau+1)*N),nrow=tau+1)

# 1. Initialize
x.pf[1, ] <- rnorm(N)
m <- rep(NA,tau)
for (t in 2:(tau+1)) {
# 2. Importance sampling step
x.pf[t, ] <- x.pf[t-1,] + rnorm(N)

#Likelihood
w.tilde <- dnorm(y[t-1], mean=x.pf[t, ])

#Normalize
w <- w.tilde/sum(w.tilde)

# NOTE: This step isn't part of your description of the algorithm, but I'm going to compute the mean
# of the particle distribution here to compare with the Kalman filter later. Note that this is done BEFORE resampling
m[t-1] <- sum(w*x.pf[t,])

# 3. Resampling step
s <- sample(1:N, size=N, replace=TRUE, prob=w)

# Note: resample WHOLE path, not just x.pf[t, ]
x.pf <- x.pf[, s]
}

plot(x)
lines(m,col="red")

# Let's do the Kalman filter to compare
library(dlm)
lines(dropFirst(dlmFilter(y, dlmModPoly(order=1))\$m), col="blue")

legend("topleft", legend = c("Actual x", "Particle filter (mean)", "Kalman filter"), col=c("black","red","blue"), lwd=1)


The resulting graph:

A useful tutorial is the one by Doucet and Johansen, see here.