# What is the difference between ‘regular’ linear regression and deep learning linear regression?

I want to know the difference between linear regression in a regular machine learning analysis and linear regression in “deep learning” setting. What algorithms are used for linear regression in deep learning setting.

Assuming that by deep learning you meant more precisely neural networks: a vanilla fully connected feedforward neural network with only linear activation functions will perform linear regression, regardless of how many layers it has. One difference is that with a neural network one typically uses gradient descent, whereas with “normal” linear regression one uses the normal equation if possible (when the number of features isn’t too huge).

Example of a fully connected feedforward neural network with no hidden layer and using a linear activation function (namely the identity activation function):

If you replace the activation function of the output layer with a sigmoid function, then the neural network performs logistic regression. If you replace the activation function of the output layer with a softmax function and add a few output units, then the neural network performs multiclass logistic regression:
Difference between logistic regression and neural networks. If you replace the cost function with the hinge loss, then the neural network is an SVM optimized in its primal form: http://cs231n.github.io/linear-classify/.

Here is the example shown in the picture above programmed in TensorFlow:

""" Linear Regression Example """
# https://github.com/tflearn/tflearn/blob/master/examples/basics/linear_regression.py

from __future__ import absolute_import, division, print_function

import tflearn

# Regression data
X = [3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,7.042,10.791,5.313,7.997,5.654,9.27,3.1]
Y = [1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,2.827,3.465,1.65,2.904,2.42,2.94,1.3]

# Linear Regression graph
input_ = tflearn.input_data(shape=[None])
linear = tflearn.single_unit(input_)
regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square',
metric='R2', learning_rate=0.01)
m = tflearn.DNN(regression)
m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False)

print("\nRegression result:")
print("Y = " + str(m.get_weights(linear.W)) +
"*X + " + str(m.get_weights(linear.b)))

print("\nTest prediction for x = 3.2, 3.3, 3.4:")
print(m.predict([3.2, 3.3, 3.4]))
# should output (close, not exact) y = [1.5315033197402954, 1.5585315227508545, 1.5855598449707031]


Here is a code snippet that does not use any neural network libraries:

# From http://briandolhansky.com/blog/artificial-neural-networks-linear-regression-part-1
import matplotlib.pyplot as plt
import numpy as np

# Load the data and create the data matrices X and Y
# This creates a feature vector X with a column of ones (bias)
# and a column of car weights.
# The target vector Y is a column of MPG values for each car.
N = np.shape(X_file)[0]
X = np.hstack((np.ones(N).reshape(N, 1), X_file[:, 4].reshape(N, 1)))
Y = X_file[:, 0]

# Standardize the input
X[:, 1] = (X[:, 1]-np.mean(X[:, 1]))/np.std(X[:, 1])

# There are two weights, the bias weight and the feature weight
w = np.array([0, 0])

# Start batch gradient descent, it will run for max_iter epochs and have a step
# size eta
max_iter = 100
eta = 1E-3
for t in range(0, max_iter):
# We need to iterate over each data point for one epoch
for i in range(0, N):
x_i = X[i, :]
y_i = Y[i]
# Dot product, computes h(x_i, w)
h = np.dot(w, x_i)-y_i

# Update the weights
print "Weights found:",w

# Plot the data and best fit line
tt = np.linspace(np.min(X[:, 1]), np.max(X[:, 1]), 10)
bf_line = w[0]+w[1]*tt

plt.plot(X[:, 1], Y, 'kx', tt, bf_line, 'r-')
plt.xlabel('Weight (Normalized)')
plt.ylabel('MPG')
plt.title('ANN Regression on 1D MPG Data')

plt.savefig('mpg.png')

plt.show()


Data file mpg.csv (~50% abridged due to Stack Exchange answer size limitation):

