Particle approximations to probability densities are often introduced as a weighted sum of Dirac functions

$$p(x) \approx \sum_{i=1}^N \omega^i \delta(x-x^i)$$

with the weights

$$\omega^i \propto \frac{p(x^i)}{q(x^i)}$$

normalized such that they sum to unity; where $q(\cdot)$ is the importance density. I understand that the Dirac function becomes infinitely large at a point $p$, that is $\delta(p) = \infty$ and that it is zero elsewhere, that is $\delta(x) = 0 ~\forall x \neq p$. Also, I understand that the Dirac function integrated over the mass point takes the value of unity.

My questions are:

- What is the relationship between the support of the particle approximation and the Dirac function?
- Why is a summation sign used when evaluating $\delta$ can only ever yield a value of 0 or infinity? Shouldn’t this be an integral instead?
- How can the notion of the support of a function be extended to a set of points (e.g., $x_t^{(i)}$), which isn’t itself a function?
How can a representation of a probability density function arise from a weighted sum of $\delta(\cdot)$s that themselves take only values of either zero or infinity?Thank you for any clarifications you may be able to provide.

**Answer**

*@user20160* has already given you nice answer to your (1)-(3) questions, but the last one seems to be not yet fully answered.

How can a representation of a probability density function arise from a weighted sum of $\delta(\cdot)$s that themselves take only

values of either zero or infinity?

Let me start with quoting Wikipedia as it provides a pretty clear description in this case (notice the bolds I added):

The Dirac delta can be loosely thought of as a function on the real

line which is zero everywhere except at the origin, where it is

infinite,$$\delta(x) = \begin{cases} +\infty, & x = 0 \\ 0, & x \ne 0

\end{cases}$$and which is also constrained to satisfy the identity

$$\int_{-\infty}^\infty \delta(x) \, dx = 1$$

This is merely a heuristic characterization. The Dirac delta is

not ain the traditional sense as

functionno function defined on the real. The Dirac delta function can be

numbers has these properties

rigorously defined either as a distribution or as a measure.

Further on, Wikipedia provides more formal definition and lots of worked examples, so I’d recommend you go through the whole article. Let me quote one example from it:

In probability theory and statistics, the Dirac delta function is

often used to represent a discrete distribution, or a partially

discrete, partially continuous distribution, using a probability

density function (which is normally used to represent fully continuous

distributions). For example, the probability density function $f(x)$

of a discrete distribution consisting of points $x = \{x_1, \dots,

x_n\}$, with corresponding probabilities $p_1, \dots, p_n$, can be

written as$$ f(x) = \sum_{i=1}^n p_i \delta(x-x_i) $$

What this equation is saying is that we take sum over $n$ continuous distributions $\delta_{x_i} = \delta(x-x_i)$ that have *all their mass* around $x_i$’s. If you’d try to imagine $\delta_{x_i}$ distributions in terms of cumulative distribution functions, it needs to be

$$

F_{x_i}(x) =

\begin{cases}

0 & \text{if } x < x_i \\

1 & \text{if } x \ge x_i

\end{cases}

$$

So we can re-write previous density to cumulative distribution function

$$ F(x) = \sum_{i=1}^n p_i F_{x_i}(x) = \sum_{i=1}^n p_i \mathbf{1}_{x \ge x_i} $$

where $\mathbf{1}_{x \ge x_i}$ is an indicator function pointing at $x_i$. Notice that this basically is a categorical distribution in disguise. Moreover, you can define Dirac delta in terms of arbitrary function

$$ \int_{-\infty}^\infty f(x) \delta(x-x_i) dx = f(x_i) $$

so it “works” as continuous version of indicator function.

The take-away message is that Dirac delta is not a standard function. It’s also *not equal* to infinity at zero — if it was, it would be useless because infinity is not a number, so we couldn’t perform any arithmetic operations over it. You can think of Dirac delta simply as an indicator function pointing at some $x_i$ that is continuous and integrates to unity. No black magic involved, it is just a way to hack the calculus to deal with discrete values.

**Attribution***Source : Link , Question Author : Constantin , Answer Author : Tim*