# Do we actually take random line in first step of linear regression?

This is the screenshot I took from a video on linear regression made by Luis Serrano. He explained linear regression step by step (scratch version). The first step was to start with a random line.

The question is do we actually draw a random line, or instead do we perform some calculation like taking an average of y values and initially draw a line. Because if we take any random line it might not fall near any points at all. Maybe it will fall on the 3rd quadrant of the coordinate system where there are no points in this case.

NO

What we want to find are the parameters that result in the least amount of error, and OLS defines error as the squared differences between observed values $$yiy_i$$ and predicted values $$ˆyi\hat y_i$$. Error often gets denoted by an $$LL$$ for “loss”.

$$L(y,ˆy)=N∑i=1(yi−ˆyi)2 L(y, \hat y) = \sum_{i = 1}^N \bigg(y_i - \hat y_i\bigg)^2$$

We have our regression model, $$ˆyi=ˆβ0+ˆβ1x\hat y_i =\hat\beta_0 + \hat\beta_1x$$, so the $$ˆy\hat y$$ is a function of $$ˆβ0\hat\beta_0$$ and $$ˆβ1\hat\beta_1$$.

$$L(y,ˆβ0,ˆβ1)=N∑i=1(yi−(ˆβ0+ˆβ1x))2 L(y, \hat\beta_0, \hat\beta_1) = \sum_{i = 1}^N \bigg(y_i - (\hat\beta_0 + \hat\beta_1x)\bigg)^2$$

We want to find the $$ˆβ0\hat\beta_0$$ and $$ˆβ1\hat\beta_1$$ that minimize $$LL$$.

What the video does is simulate pieces of the entire “loss function”. For $$ˆβ0=1\hat\beta_0 = 1$$ and $$ˆβ1=7\hat\beta_1 = 7$$, you get a certain loss value. For $$ˆβ0=1\hat\beta_0 = 1$$ and $$ˆβ1=8\hat\beta_1 = 8$$, you get another loss value. One approach to finding the minimum is to pick random values until you find one that results in a loss value that seems low enough (or you’re tired of waiting). Much of the deep learning work uses variations of this, with tricks like stochastic gradient descent to make the algorithm get (close to) the right answer in a short amount of time.

In OLS linear regression, however, calculus gives us a solution to the minimization problem, and we do not have to play such games.

$$ˆβ1=cov(x,y)var(x)ˆβ0=ˉy−ˆβ1ˉx\hat\beta_1=\frac{cov(x,y)}{var(x)}\\ \hat\beta_0=\bar y-\hat\beta_1\bar x$$