This is the screenshot I took from a video on linear regression made by Luis Serrano. He explained linear regression step by step (scratch version). The first step was to start with a random line.

The question is do we actually draw a random line, or instead do we perform some calculation like taking an average of y values and initially draw a line. Because if we take any random line it might not fall near any points at all. Maybe it will fall on the 3rd quadrant of the coordinate system where there are no points in this case.

**Answer**

**NO**

What we want to find are the parameters that result in the least amount of error, and OLS defines error as the squared differences between observed values yi and predicted values ˆyi. Error often gets denoted by an L for “loss”.

L(y,ˆy)=N∑i=1(yi−ˆyi)2

We have our regression model, ˆyi=ˆβ0+ˆβ1x, so the ˆy is a function of ˆβ0 and ˆβ1.

L(y,ˆβ0,ˆβ1)=N∑i=1(yi−(ˆβ0+ˆβ1x))2

We want to find the ˆβ0 and ˆβ1 that minimize L.

What the video does is simulate pieces of the entire “loss function”. For ˆβ0=1 and ˆβ1=7, you get a certain loss value. For ˆβ0=1 and ˆβ1=8, you get another loss value. One approach to finding the minimum is to pick random values until you find one that results in a loss value that seems low enough (or you’re tired of waiting). Much of the deep learning work uses variations of this, with tricks like stochastic gradient descent to make the algorithm get (close to) the right answer in a short amount of time.

In OLS linear regression, however, calculus gives us a solution to the minimization problem, and we do not have to play such games.

ˆβ1=cov(x,y)var(x)ˆβ0=ˉy−ˆβ1ˉx

**Attribution***Source : Link , Question Author : F.C. Akhi , Answer Author : Dave*