I am trying to compute the 95% credible interval of the following posterior distribution. I could not find the function in R for it but is the approach below correct?

`x <- seq(0.4,12,0.4) px <- c(0,0, 0, 0, 0, 0, 0.0002, 0.0037, 0.018, 0.06, 0.22 ,0.43, 0.64,0.7579, 0.7870, 0.72, 0.555, 0.37, 0.24, 0.11, 0.07, 0.02, 0.009, 0.005, 0.0001, 0,0.0002, 0, 0, 0) plot(x,px, type="l") mm <- sum(x*px)/sum(px) var <- (sum((x)^2*px)/sum(px)) - (mm^2) cat("95% credible interval: ", round(mm -1.96*sqrt(var),3), "-", round(mm + 1.96*sqrt(var),3),"\n")`

**Answer**

As noted by *Henry*, you are assuming normal distribution and it’s perfectly ok if your data follows normal distribution, but will be incorrect if you cannot assume normal distribution for it. Below I describe two different approaches that you could use for *unknown* distribution given only datapoints `x`

and accompanying density estimates `px`

.

The first thing to consider is what exactly do you want to summarize using your intervals. For example, you could be interested in intervals obtained using quantiles, but you could also be interested in highest density region (see here, or here) of your distribution. While this should not make much (if any) difference in simple cases like symmetric, unimodal distributions, this will make a difference for more “complicated” distributions. Generally, quantiles will give you interval containing probability mass concentrated around the *median* (the middle 100α% of your distribution), while highest density region is a region around the *modes* of the distribution. This will be more clear if you compare the two plots on the picture below — quantiles “cut” the distribution vertically, while highest density region “cuts” it horizontally.

Next thing to consider is how to deal with the fact that you have incomplete information about the distribution (assuming that we are talking about continuous distribution, you have only a bunch of points rather then a function). What you could do about it is to take the values “as is”, or use some kind of interpolation, or smoothing, to obtain the “in between” values.

One approach would be to use linear interpolation (see `?approxfun`

in R), or alternatively something more smooth like splines (see `?splinefun`

in R). If you choose such approach you have to remember that interpolation algorithms have no domain knowledge about your data and can return invalid results like values below zero etc.

```
# grid of points
xx <- seq(min(x), max(x), by = 0.001)
# interpolate function from the sample
fx <- splinefun(x, px) # interpolating function
pxx <- pmax(0, fx(xx)) # normalize so prob >0
```

Second approach that you could consider is to use kernel density/mixture distribution to approximate your distribution using the data you have. The tricky part in here is to decide about optimal bandwidth.

```
# density of kernel density/mixture distribution
dmix <- function(x, m, s, w) {
k <- length(m)
rowSums(vapply(1:k, function(j) w[j]*dnorm(x, m[j], s[j]), numeric(length(x))))
}
# approximate function using kernel density/mixture distribution
pxx <- dmix(xx, x, rep(0.4, length.out = length(x)), px) # bandwidth 0.4 chosen arbitrary
```

Next, you are going to find the intervals of interest. You can either proceed numerically, or by simulation.

*1a) Sampling to obtain quantile intervals*

```
# sample from the "empirical" distribution
samp <- sample(xx, 1e5, replace = TRUE, prob = pxx)
# or sample from kernel density
idx <- sample.int(length(x), 1e5, replace = TRUE, prob = px)
samp <- rnorm(1e5, x[idx], 0.4) # this is arbitrary sd
# and take sample quantiles
quantile(samp, c(0.05, 0.975))
```

*1b) Sampling to obtain highest density region*

```
samp <- sample(pxx, 1e5, replace = TRUE, prob = pxx) # sample probabilities
crit <- quantile(samp, 0.05) # boundary for the lower 5% of probability mass
# values from the 95% highest density region
xx[pxx >= crit]
```

*2a) Find quantiles numerically*

```
cpxx <- cumsum(pxx) / sum(pxx)
xx[which(cpxx >= 0.025)[1]] # lower boundary
xx[which(cpxx >= 0.975)[1]-1] # upper boundary
```

*2b) Find highest density region numerically*

```
const <- sum(pxx)
spxx <- sort(pxx, decreasing = TRUE) / const
crit <- spxx[which(cumsum(spxx) >= 0.95)[1]] * const
```

As you can see on the plots below, in case of unimodal, symmetric distribution both methods return the same interval.

Of course, you could also try to find 100α% interval around some central value such that Pr and use some kind of optimization to find appropriate \zeta, but the two approaches described above seem to be used more commonly and are more intuitive.

**Attribution***Source : Link , Question Author : user19758 , Answer Author : Community*