Linear Regression

Assume the model is:

We can write a base class using this formula for our linear regression implement below,

1
2
3
4
5
6
7
8
9
10
11
12
class LinearRegressionBase:
def __init__(self):
# beta: shape (p, 1)
self.beta = None

def fit(self, X, y):
# X: shape (n, p), Y: shape (n, 1)
pass

def predict(self, X):
# X: shape (n, p)
return X @ self.beta

And we can generate some data for test,

1
2
3
4
5
6
7
8
n = 100
p = 5
noise = np.random.randn(n) / 10
X = np.random.randn(n, p) * 10
weights = np.ones(p)
weights[[1,2,3]]=[2,5,3]
y = (X @ weights + noise).reshape(-1, 1)
X.shape, y.shape, X.mean(), y.mean()

The code above represents the ground truth is

Ordinary Explicit Solution

Using Mean Square Error as the loss function

Find the derivative of $\beta$.

IMG_0490

The least square solution would be like

Using python to implement that,

1
2
3
4
class LinearRegression:
def fit(self, X, y):
super().fit(X, y)
self.beta = (np.linalg.inv(X.T @ X) @ X.T) @ y

And test it,

1
2
3
4
5
6
7
8
lr = LinearRegressionO()
lr.fit(X, y)
pred = lr.predict(X)
print(f'beta {lr.beta.flatten()}')
print(f'error {np.linalg.norm(pred-y)}')

# beta [0.99903924 1.99955203 5.00101573 3.00037858 1.00036293]
# error 0.9101951355773127

Ridge Regression

Same as last part. But we add a l2 regularization to the loss function.

Very similar to the last part, take the derivative of $\beta$, we get

Let the derivative to be 0. We get,

We don’t need to assume the invertibility since the $(X^TX+\lambda I)$ is always invertible.

Simple proof:

Any vector is the eigenvector of $\lambda I$ since $\lambda I v = \lambda v$. And all eigenvalues of $\lambda I$ is $\lambda$.

$X^TX$ is a symmetric matrix and it’s positive semi-definite, so all the eigenvalues are bigger or equal to 0.

So any eigenvalues and eigenvectors of $X^TX$, note by $a_i, v_i$,

So all the eigenvalues of $(X^TX+\lambda I)$ is bigger or equal to $\lambda>0$.

Which means $(X^TX+\lambda I)$ is always invertible.

Using python to implement that,

1
2
3
4
5
class RidgeRegression(LinearRegressionBase):
def fit(self, X, y, l):
# l is the \lambda in the formula
super().fit(X, y)
self.beta = (np.linalg.inv(X.T @ X + l * np.identity(X.shape[1])) @ X.T) @ y

And test it, using different $\lambda$,

1
2
3
4
5
6
7
8
9
10
11
l = 10:
beta [0.99547771 1.99660303 4.99462785 2.99652915 0.99752248]
error 1.1770827499055974

l = 1
beta [0.99868234 1.99925674 5.00037611 2.99999319 1.00007827]
error 0.9132585574988228

l = 0.1
beta [0.99900354 1.9995225 5.00095176 3.00034004 1.00033446]
error 0.9102258292688261

Lasso

请我喝杯咖啡吧~