I am trying to understand how influence functions work. Could someone explain in the context of a simple OLS regression

yi=α+β⋅xi+εi

where I want the influence function for β.

**Answer**

Influence functions are basically an analytical tool that can be used to assess the effect (or “influence”) of removing an observation on the value of a statistic *without having to re-calculate that statistic*. They can also be used to create asymptotic variance estimates. If influence equals I then asymptotic variance is I2n.

The way I understand influence functions is as follows. You have some sort of theoretical CDF, denoted by Fi(y)=Pr(Yi<yi). For simple OLS, you have

Pr(Yi<yi)=Pr(α+βxi+ϵi<yi)=Φ(yi−(α+βxi)σ)

Where Φ(z) is the standard normal CDF, and σ2 is the error variance. Now you can show that any statistic will be a function of this CDF, hence the notation S(F) (i.e. some function of F). Now suppose we change the function F by a "little bit", to F(i)(z)=(1+ζ)F(z)−ζδ(i)(z) Where δi(z)=I(yi<z), and ζ=1n−1. Thus F(i) represents the CDF of the data with the "ith" data point removed. We can do a taylor series of F(i)(z) about ζ=0. This gives:

S[F(i)(z,ζ)]≈S[F(i)(z,0)]+ζ[∂S[F(i)(z,ζ)]∂ζ|ζ=0]

Note that F(i)(z,0)=F(z) so we get:

S[F(i)(z,ζ)]≈S[F(z)]+ζ[∂S[F(i)(z,ζ)]∂ζ|ζ=0]

The partial derivative here is called the influence function. So this represents an approximate "first order" correction to be made to a statistic due to deleting the "ith" observation. Note that in regression the remainder does not go to zero asymtotically, so that this is an approximation to the changes you may actually get. Now write β as:

β=1n∑nj=1(yj−¯y)(xj−¯x)1n∑nj=1(xj−¯x)2

Thus beta is a function of two statistics: the variance of X and covariance between X and Y. These two statistics have representations in terms of the CDF as:

cov(X,Y)=∫(X−μx(F))(Y−μy(F))dF

and

var(X)=∫(X−μx(F))2dF

where

μx=∫xdF

To remove the ith observation we replace F→F(i)=(1+ζ)F−ζδ(i) in both integrals to give:

μx(i)=∫xd[(1+ζ)F−ζδ(i)]=μx−ζ(xi−μx)

Var(X)(i)=∫(X−μx(i))2dF(i)=∫(X−μx+ζ(xi−μx))2d[(1+ζ)F−ζδ(i)]

ignoring terms of ζ2 and simplifying we get:

Var(X)(i)≈Var(X)−ζ[(xi−μx)2−Var(X)]

Similarly for the covariance

Cov(X,Y)(i)≈Cov(X,Y)−ζ[(xi−μx)(yi−μy)−Cov(X,Y)]

So we can now express β(i) as a function of ζ. This is:

β(i)(ζ)≈Cov(X,Y)−ζ[(xi−μx)(yi−μy)−Cov(X,Y)]Var(X)−ζ[(xi−μx)2−Var(X)]

We can now use the Taylor series:

β(i)(ζ)≈β(i)(0)+ζ[∂β(i)(ζ)∂ζ]ζ=0

Simplifying this gives:

β(i)(ζ)≈β−ζ[(xi−μx)(yi−μy)Var(X)−β(xi−μx)2Var(X)]

And plugging in the values of the statistics μy, μx, var(X), and ζ=1n−1 we get:

β(i)≈β−xi−¯xn−1[yi−¯y1n∑nj=1(xj−¯x)2−βxi−¯x1n∑nj=1(xj−¯x)2]

And you can see how the effect of removing a single observation can be approximated without having to re-fit the model. You can also see how an x equal to the average has *no influence on the slope of the line*. Think about this and you will see how it makes sense. You can also write this more succinctly in terms of the standardised values ˜x=x−¯xsx (similarly for y):

β(i)≈β−~xin−1[~yisysx−~xiβ]

**Attribution***Source : Link , Question Author : stevejb , Answer Author : probabilityislogic*