In a multiple linear regression it is possible to find out the coeffient with the following formula.

$b = (X’X)^{-1}(X’)Y$

`beta = solve(t(X) %*% X) %*% (t(X) %*% Y) ; beta`

For instance:

`> y <- c(9.3, 4.8, 8.9, 6.5, 4.2, 6.2, 7.4, 6, 7.6, 6.1) > x0 <- c(1,1,1,1,1,1,1,1,1,1) > x1 <- c(100,50,100,100,50,80,75,65,90,90) > x2 <- c(4,3,4,2,2,2,3,4,3,2) > Y <- as.matrix(y) > X <- as.matrix(cbind(x0,x1,x2)) > beta = solve(t(X) %*% X) %*% (t(X) %*% Y);beta [,1] x0 -0.8687015 x1 0.0611346 x2 0.9234254 > model <- lm(y~+x1+x2) ; model$coefficients (Intercept) x1 x2 -0.8687015 0.0611346 0.9234254`

I would like how to calculate in the same “manual” way the beta for a logistic regression. Where of course the y would be 1 or 0. Assuming I’m using the binomial family with a logit link.

**Answer**

The OLS estimator in the linear regression model is quite rare in

having the property that it can be represented in closed form, that is

without needing to be expressed as the optimizer of a function. It is, however, an optimizer of a function — the residual sum of squares

function — and can be computed as such.

The MLE in the logistic regression model is also the optimizer of a

suitably defined log-likelihood function, but since it is not available

in a closed form expression, it must be computed as an optimizer.

Most statistical estimators are only expressible as optimizers of

appropriately constructed functions of the data called criterion functions.

Such optimizers require the use of appropriate numerical optimization

algorithms.

Optimizers of functions can be computed in R using the `optim()`

function

that provides some general purpose optimization algorithms, or one of the more

specialized packages such as `optimx`

. Knowing which

optimization algorithm to use for different types of models and statistical criterion

functions is key.

## Linear regression residual sum of squares

The OLS estimator is defined as the optimizer of the well-known residual sum of

squares function:

$$

\begin{align}

\hat{\boldsymbol{\beta}} &= \arg\min_{\boldsymbol{\beta}}\left(\boldsymbol{Y} – \mathbf{X}\boldsymbol{\beta}\right)’\left(\boldsymbol{Y} – \mathbf{X}\boldsymbol{\beta}\right) \\

&= (\mathbf{X}’\mathbf{X})^{-1}\mathbf{X}’\boldsymbol{Y}

\end{align}

$$

In the case of a twice differentiable, convex function like the residual sum of squares,

most gradient-based optimizers do good job. In this case, I will be using the BFGS

algorithm.

```
#================================================
# reading in the data & pre-processing
#================================================
urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
dfSheather = as.data.frame(read.csv(urlSheatherData, header = TRUE))
# create the design matrices
vY = as.matrix(dfSheather['InMichelin'])
mX = as.matrix(dfSheather[c('Service','Decor', 'Food', 'Price')])
# add an intercept to the predictor variables
mX = cbind(1, mX)
# the number of variables and observations
iK = ncol(mX)
iN = nrow(mX)
#================================================
# compute the linear regression parameters as
# an optimal value
#================================================
# the residual sum of squares criterion function
fnRSS = function(vBeta, vY, mX) {
return(sum((vY - mX %*% vBeta)^2))
}
# arbitrary starting values
vBeta0 = rep(0, ncol(mX))
# minimise the RSS function to get the parameter estimates
optimLinReg = optim(vBeta0, fnRSS,
mX = mX, vY = vY, method = 'BFGS',
hessian=TRUE)
#================================================
# compare to the LM function
#================================================
linregSheather = lm(InMichelin ~ Service + Decor + Food + Price,
data = dfSheather)
```

This yields:

```
> print(cbind(coef(linregSheather), optimLinReg$par))
[,1] [,2]
(Intercept) -1.492092490 -1.492093965
Service -0.011176619 -0.011176583
Decor 0.044193000 0.044193023
Food 0.057733737 0.057733770
Price 0.001797941 0.001797934
```

## Logistic regression log-likelihood

The criterion function corresponding to the MLE in the logistic regression model is

the log-likelihood function.

$$

\begin{align}

\log L_n(\boldsymbol{\beta}) &= \sum_{i=1}^n \left(Y_i \log \Lambda(\boldsymbol{X}_i’\boldsymbol{\beta}) +

(1-Y_i)\log(1 – \Lambda(\boldsymbol{X}_i’\boldsymbol{\beta}))\right)

\end{align}

$$

where $\Lambda(k) = 1/(1+ \exp(-k))$ is the logistic function. The parameter estimates are the optimizers of this function

$$

\hat{\boldsymbol{\beta}} = \arg\max_{\boldsymbol{\beta}}\log L_n(\boldsymbol{\beta})

$$

I show how to construct and optimize the criterion function using the `optim()`

function

once again employing the BFGS algorithm.

```
#================================================
# compute the logistic regression parameters as
# an optimal value
#================================================
# define the logistic transformation
logit = function(mX, vBeta) {
return(exp(mX %*% vBeta)/(1+ exp(mX %*% vBeta)) )
}
# stable parametrisation of the log-likelihood function
# Note: The negative of the log-likelihood is being returned, since we will be
# /minimising/ the function.
logLikelihoodLogitStable = function(vBeta, mX, vY) {
return(-sum(
vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
+ (1-vY)*(-log(1 + exp(mX %*% vBeta)))
)
)
}
# initial set of parameters
vBeta0 = c(10, -0.1, -0.3, 0.001, 0.01) # arbitrary starting parameters
# minimise the (negative) log-likelihood to get the logit fit
optimLogit = optim(vBeta0, logLikelihoodLogitStable,
mX = mX, vY = vY, method = 'BFGS',
hessian=TRUE)
#================================================
# test against the implementation in R
# NOTE glm uses IRWLS:
# http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
# rather than the BFGS algorithm that we have reported
#================================================
logitSheather = glm(InMichelin ~ Service + Decor + Food + Price,
data = dfSheather,
family = binomial, x = TRUE)
```

This yields

```
> print(cbind(coef(logitSheather), optimLogit$par))
[,1] [,2]
(Intercept) -11.19745057 -11.19661798
Service -0.19242411 -0.19249119
Decor 0.09997273 0.09992445
Food 0.40484706 0.40483753
Price 0.09171953 0.09175369
```

As a caveat, note that numerical optimization algorithms require careful use or you can end up with

all sorts of pathological solutions. Until you understand them well, it is best to

use the available packaged options that allow you to concentrate on specifying the model

rather than worrying about how to numerically compute the estimates.

**Attribution***Source : Link , Question Author : S12000 , Answer Author : tchakravarty*