How to test for simultaneous equality of choosen coefficients in logit or probit model ? What is the standard approach and what is the state of art approach ?

**Answer**

## Wald test

One standard approach is the Wald test. This is what the Stata command `test`

does after a logit or probit regression. Let’s see how this works in R by looking at an example:

```
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") # Load dataset from the web
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial") # calculate the logistic regression
summary(mylogit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
rank2 -0.675443 0.316490 -2.134 0.032829 *
rank3 -1.340204 0.345306 -3.881 0.000104 ***
rank4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

Say, you want to test the hypothesis \beta_{gre}=\beta_{gpa} vs. \beta_{gre}\neq \beta_{gpa}. This is equivalent of testing \beta_{gre} – \beta_{gpa} = 0. The Wald test statistic is:

W=\frac{(\hat{\beta}-\beta_{0})}{\widehat{\operatorname{se}}(\hat{\beta})}\sim \mathcal{N}(0,1)

or

W^2 = \frac{(\hat{\theta}-\theta_{0})^2}{\operatorname{Var}(\hat{\theta})}\sim \chi_{1}^2

Our \widehat{\theta} here is \beta_{gre} – \beta_{gpa} and \theta_{0}=0. So all we need is the standard error of \beta_{gre} – \beta_{gpa}. We can calculate the standard error with the Delta method:

\hat{se}(\beta_{gre} – \beta_{gpa})\approx \sqrt{\operatorname{Var}(\beta_{gre}) + \operatorname{Var}(\beta_{gpa}) – 2\cdot \operatorname{Cov}(\beta_{gre},\beta_{gpa})}

So we also need the covariance of \beta_{gre} and \beta_{gpa}. The variance-covariance matrix can be extracted with the `vcov`

command after running the logistic regression:

```
var.mat <- vcov(mylogit)[c("gre", "gpa"),c("gre", "gpa")]
colnames(var.mat) <- rownames(var.mat) <- c("gre", "gpa")
gre gpa
gre 1.196831e-06 -0.0001241775
gpa -1.241775e-04 0.1101040465
```

Finally, we can calculate the standard error:

```
se <- sqrt(1.196831e-06 + 0.1101040465 -2*-0.0001241775)
se
[1] 0.3321951
```

So your Wald z-value is

```
wald.z <- (gre-gpa)/se
wald.z
[1] -2.413564
```

To get a p-value, just use the standard normal distribution:

```
2*pnorm(-2.413564)
[1] 0.01579735
```

In this case we have evidence that the coefficients are different from each other. This approach can be extended to more than two coefficients.

**Using multcomp**

This rather tedious calculations can be conveniently done in `R`

using the `multcomp`

package. Here is the same example as above but done with `multcomp`

:

```
library(multcomp)
glht.mod <- glht(mylogit, linfct = c("gre - gpa = 0"))
summary(glht.mod)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
gre - gpa == 0 -0.8018 0.3322 -2.414 0.0158 *
confint(glht.mod)
```

A confidence interval for the difference of the coefficients can also be calculated:

```
Quantile = 1.96
95% family-wise confidence level
Linear Hypotheses:
Estimate lwr upr
gre - gpa == 0 -0.8018 -1.4529 -0.1507
```

For additional examples of `multcomp`

, see here or here.

## Likelihood ratio test (LRT)

The coefficients of a logistic regression are found by maximum likelihood. But because the likelihood function involves a lot of products, the log-likelihood is maximized which turns the products into sums. The model that fits better has a higher log-likelihood. The model involving more variables has at least the same likelihood as the null model. Denote the log-likelihood of the alternative model (model containing more variables) with LL_{a} and the log-likelihood of the null model with LL_{0}, the likelihood ratio test statistic is:

D=2\cdot (LL_{a} – LL_{0})\sim \chi_{df1-df2}^{2}

The likelihood ratio test statistic follows a \chi^{2}-distribution with the degrees of freedom being the difference in number of variables. In our case, this is 2.

To perform the likelihood ratio test, we also need to fit the model with the constraint \beta_{gre}=\beta_{gpa} to be able to compare the two likelihoods. The full model has the form \log\left(\frac{p_{i}}{1-p_{i}}\right)=\beta_{0}+\beta_{1}\cdot \mathrm{gre} + \beta_{2}\cdot \mathrm{gpa}+\beta_{3}\cdot \mathrm{rank_{2}} + \beta_{4}\cdot \mathrm{rank_{3}}+\beta_{5}\cdot \mathrm{rank_{4}}. Our constraint model has the form: \log\left(\frac{p_{i}}{1-p_{i}}\right)=\beta_{0}+\beta_{1}\cdot (\mathrm{gre} + \mathrm{gpa})+\beta_{2}\cdot \mathrm{rank_{2}} + \beta_{3}\cdot \mathrm{rank_{3}}+\beta_{4}\cdot \mathrm{rank_{4}}.

```
mylogit2 <- glm(admit ~ I(gre + gpa) + rank, data = mydata, family = "binomial")
```

In our case, we can use `logLik`

to extract the log-likelihood of the two models after a logistic regression:

```
L1 <- logLik(mylogit)
L1
'log Lik.' -229.2587 (df=6)
L2 <- logLik(mylogit2)
L2
'log Lik.' -232.2416 (df=5)
```

The model containing the constraint on `gre`

and `gpa`

has a slightly higher log-likelihood (-232.24) compared to the full model (-229.26). Our likelihood ratio test statistic is:

```
D <- 2*(L1 - L2)
D
[1] 16.44923
```

We can now use the CDF of the \chi^{2}_{2} to calculate the p-value:

```
1-pchisq(D, df=1)
[1] 0.01458625
```

The p-value is very small indicating that the coefficients are different.

R has the likelihood ratio test built in; we can use the `anova`

function to calculate the likelhood ratio test:

```
anova(mylogit2, mylogit, test="LRT")
Analysis of Deviance Table
Model 1: admit ~ I(gre + gpa) + rank
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 395 464.48
2 394 458.52 1 5.9658 0.01459 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

Again, we have strong evidence that the coefficients of `gre`

and `gpa`

are significantly different from each other.

## Score test (aka Rao’s Score test aka Lagrange multiplier test)

The Score function U(\theta) is the derivative of the log-likelihood function (\text{log} L(\theta|x)) where \theta are the parameters and x the data (the univariate case is shown here for illustration purposes):

U(\theta) = \frac{\partial \text{log} L(\theta|x)}{\partial \theta}

This is basically the slope of the log-likelihood function. Further, let I(\theta) be the Fisher information matrix which is the negative expectation of the second derivative of the log-likelihood function with respect to \theta. The score test statistics is:

S(\theta_{0})=\frac{U(\theta_{0}^{2})}{I(\theta_{0})}\sim\chi^{2}_{1}

The score test can also be calculated using `anova`

(the score test statistics is called “Rao”):

```
anova(mylogit2, mylogit, test="Rao")
Analysis of Deviance Table
Model 1: admit ~ I(gre + gpa) + rank
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Rao Pr(>Chi)
1 395 464.48
2 394 458.52 1 5.9658 5.9144 0.01502 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

The conclusion is the same as before.

**Note**

An interesting relationship between the different test statistics when the model is linear is (Johnston and DiNardo (1997): Econometric Methods): Wald \geq LR \geq Score.

**Attribution***Source : Link , Question Author : Qbik , Answer Author : Community*