This is in part motivated by the following question and the discussion following it.

Suppose the iid sample is observed, $X_i\sim F(x,\theta)$. The goal is to estimate $\theta$. But original sample is not available. What we have instead are some statistics of the sample $T_1,…,T_k$. Suppose $k$ is fixed. How do we estimate $\theta$? What would be maximum likelihood estimator in this case?

**Answer**

In this case, you can consider an ABC approximation of the likelihood (and consequently of the MLE) under the following assumption/restriction:

**Assumption.** The original sample size $n$ is known.

This is not a wild assumption given that the quality, in terms of convergence, of frequentist estimators depends on the sample size, therefore one cannot obtain arbitrarily good estimators without knowing the original sample size.

The idea is to generate a sample from the posterior distribution of $\theta$ and, *in order to produce an approximation of the MLE*, you can use an importance sampling technique as in [1] or to consider a uniform prior on $\theta$ with support on a suitable set as in [2].

I am going to describe the method in [2]. First of all, let me describe the ABC sampler.

**ABC Sampler**

Let $f(\cdot\vert\theta)$ be the model that generates the sample where $\theta \in \Theta$ is a parameter (to be estimated), $T$ be a statistic (a function of the sample) and $T_0$ be the observed statistic, in the ABC jargon this is called a *summary statistic*, $\rho$ be a metric, $\pi(\theta)$ a prior distribution on $\theta$ and $\epsilon>0$ a tolerance. Then, the ABC-rejection sampler can be implemented as follows.

- Sample $\theta^*$ from $\pi(\cdot)$.
- Generate a sample $\bf{x}$ of size $n$ from the model $f(\cdot\vert\theta^*)$.
- Compute $T^*=T({\bf x})$.
- If $\rho(T^*,T_0)<\epsilon$, accept $\theta^*$ as a simulation from the posterior of $\theta$.

This algorithm generates an approximate sample from the posterior distribution of $\theta$ given $T({\bf x})=T_0$. Therefore, the best scenario is when the statistic $T$ is sufficient but other statistics can be used. For a more detailed description of this see this paper.

Now, in a general framework, if one uses a uniform prior that contains the MLE in its support, then the Maximum *a posteriori* (MAP) coincides with Maximum Likelihood Estimator (MLE). Therefore, if you consider an appropriate uniform prior in the ABC Sampler, then you can generate an approximate sample of a posterior distribution whose MAP coincides with the MLE. The remaining step consists of estimating this mode. This problem has been discussed in CV, for instance in “Computationally efficient estimation of multivariate mode”.

**A toy example**

Let $(x_1,…,x_n)$ be a sample from a $N(\mu,1)$ and suppose that the only information available from this sample is $\bar{x}=\dfrac{1}{n}\sum_{j=1}^n x_j$. Let $\rho$ be the Euclidean metric in ${\mathbb R}$ and $\epsilon=0.001$. The following R code shows how to obtain an approximate MLE using the methods described above using a simulated sample with $n=100$ and $\mu=0$, a sample of the posterior distribution of size $1000$, a uniform prior for $\mu$ on $(-0.3,0.3)$, and a kernel density estimator for the estimation of the mode of the posterior sample (MAP=MLE).

```
# rm(list=ls())
# Simulated data
set.seed(1)
x = rnorm(100)
# Observed statistic
T0 = mean(x)
# ABC Sampler using a uniform prior
N=1000
eps = 0.001
ABCsamp = rep(0,N)
i=1
while(i < N+1){
u = runif(1,-0.3,0.3)
t.samp = rnorm(100,u,1)
Ts = mean(t.samp)
if(abs(Ts-T0)<eps){
ABCsamp[i]=u
i=i+1
print(i)
}
}
# Approximation of the MLE
kd = density(ABCsamp)
kd$x[which(kd$y==max(kd$y))]
```

As you can see, using a small tolerance we get a very good approximation of the MLE (which in this trivial example can be calculated from the statistic given that it is sufficient). It is important to notice that the choice of the summary statistic is crucial. Quantiles are typically a good choice for the summary statistic, but not all the choices produce a good approximation. It may be the case that the summary statistic is not very informative and then the quality of the approximation might be poor, which is well-known in the ABC community.

**Update:** A similar approach was recently published in Fan et al. (2012). See this entry for a discussion on the paper.

**Attribution***Source : Link , Question Author : mpiktas , Answer Author : whuber*