Wie kann der Least Square Estimator für die multiple lineare Regression abgeleitet werden?

30

Im einfachen linearen Regressionsfall können Sie den Schätzer für kleinste Quadrate , sodass Sie nicht kennen müssen, um zu schätzen $y=\beta_0+\beta_1x$ $\hat\beta_1=\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}$ $\hat\beta_0$ $\hat\beta_1$

Angenommen, ich habe $y=\beta_1x_1+\beta_2x_2$ . Wie kann ich ableiten, $\hat\beta_1$ ohne zu schätzen $\hat\beta_2$ ? oder geht das nicht

— Sabre CN
quelle

1

Sie können eine der Variablen weglassen und trotzdem eine unvoreingenommene Schätzung der anderen erhalten, wenn sie unabhängig sind.

— David25272

51

Die Ableitung in Matrixnotation

Ausgehend von , was eigentlich genau so ist wie $y= Xb +\epsilon$

$\begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1K} \\ x_{21} & x_{22} & \cdots & x_{2K} \\ \vdots & \ddots & \ddots & \vdots \\ x_{N1} & x_{N2} & \cdots & x_{NK} \end{bmatrix} * \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{K} \end{bmatrix} + \begin{bmatrix} \epsilon_{1} \\ \epsilon_{2} \\ \vdots \\ \epsilon_{N} \end{bmatrix}$

Alles läuft darauf hinaus, zu minimieren : $e'e$

$\epsilon'\epsilon = \begin{bmatrix} e_{1} & e_{2} & \cdots & e_{N} \\ \end{bmatrix} \begin{bmatrix} e_{1} \\ e_{2} \\ \vdots \\ e_{N} \end{bmatrix} = \sum_{i=1}^{N}e_{i}^{2}$

Das Minimieren von gibt uns also: $e'e'$

$min_{b}$ $e'e = (y-Xb)'(y-Xb)$

$min_{b}$ $e'e = y'y - 2b'X'y + b'X'Xb$

$\frac{\partial(e'e)}{\partial b} = -2X'y + 2X'Xb \stackrel{!}{=} 0$

$X'Xb=X'y$

$b=(X'X)^{-1}X'y$

Eine letzte mathematische Sache, die Bedingung zweiter Ordnung für ein Minimum, erfordert, dass die Matrix positiv definit ist. Diese Voraussetzung ist erfüllt, wenn den vollen Rang hat. $X'X$ $X$

Die genauere Ableitung, die alle Schritte in größerer Tiefe durchläuft, finden Sie unter http://economictheoryblog.com/2015/02/19/ols_estimator/

— Andreas Dibiasi
quelle

3

Diese Herleitung ist genau das, wonach ich gesucht habe. Keine übersprungenen Schritte. Erstaunlich, wie schwer es ist, dasselbe zu finden.

— Javadba

1

Sollte in der Matrixgleichung die zweite *nicht a sein +? Sollte es nicht auch

b_{K}

$b_K$ anstelle von

b_{N}

$b_N$ , damit die Abmessungen übereinstimmen?

— Alexis Olson

Alexis Olson, du hast recht! Ich habe meine Antwort bearbeitet.

— Andreas Dibiasi

13

Es ist möglich, nur einen Koeffizienten in einer Mehrfachregression zu schätzen, ohne die anderen zu schätzen.

Die Schätzung von $\beta_1$ wird erhalten, indem die Effekte von $x_2$ aus den anderen Variablen entfernt werden und dann die Residuen von $y$ gegen die Residuen von $x_1$ . Dies wird erklärt und veranschaulicht. Wie genau steuert man für andere Variablen? und Wie wird (a) der Regressionskoeffizient normalisiert? . Das Schöne an diesem Ansatz ist, dass es keinen Kalkül und keine lineare Algebra erfordert, nur mit zweidimensionaler Geometrie visualisiert werden kann, numerisch stabil ist und nur eine Grundidee der multiplen Regression ausnutzt: die des Herausnehmens (oder "Kontrollierens für"). ) die Auswirkungen einer einzelnen Variablen.

Im vorliegenden Fall kann die multiple Regression mit drei gewöhnlichen Regressionsschritten durchgeführt werden:

Regress auf (ohne konstanten Term!). Sei die Anpassung . Die Schätzung ist $y$ $x_2$ $y = \alpha_{y,2}x_2 + \delta$ Daher sind die Residuen Geometrisch istdas, was vonübrig bleibt,nachdem seine Projektion aufsubtrahiert wurde.
$α_{y, 2} = \frac{\sum_{i} y_{i} x_{2 i}}{\sum_{i} x_{2 i}^{2}} .$ $\alpha_{y,2} = \frac{\sum_i y_i x_{2i}}{\sum_i x_{2i}^2}.$ $δ = y - α_{y, 2} x_{2} .$ $\delta = y - \alpha_{y,2}x_2.$ $\delta$ $y$ $x_2$
Regresse auf (ohne konstanten Term). Die Anpassung sei . Die Schätzung ist $x_1$ $x_2$ $x_1 = \alpha_{1,2}x_2 + \gamma$ Die Residuen sindGeometrisch istdas, was vonübrig bleibt,nachdem seine Projektion aufsubtrahiert wurde.
$α_{1, 2} = \frac{\sum_{i} x_{1 i} x_{2 i}}{\sum_{i} x_{2 i}^{2}} .$ $\alpha_{1,2} = \frac{\sum_i x_{1i} x_{2i}}{\sum_i x_{2i}^2}.$ $γ = x_{1} - α_{1, 2} x_{2} .$ $\gamma = x_1 - \alpha_{1,2}x_2.$ $\gamma$ $x_1$ $x_2$
Regresse auf (ohne konstanten Term). Die Schätzung ist $\delta$ $\gamma$ Die Passung wird. Geometrischdie Komponente ist(was bedeutetmitin der herausgenommen)Richtung (das bedeutetmitentnommen).
${\hat{β}}_{1} = \frac{\sum_{i} δ_{i} γ_{i}}{\sum_{i} γ_{i}^{2}} .$ $\hat\beta_1 = \frac{\sum_i \delta_i \gamma_i}{\sum_i \gamma_i^2}.$ $\delta = \hat\beta_1 \gamma + \varepsilon$ $\hat\beta_1$ $\delta$ $y$ $x_2$ $\gamma$ $x_1$ $x_2$

Beachten Sie, dass nicht geschätzt wurde. $\beta_2$ Es kann leicht aus gestellt werden , was bisher erreicht worden ist (wie im normalen Regressions Fall ist leicht von der Steigungsschätzung erhaltenen ). Die sind die Residuen für die bivariate Regression von auf und . $\hat\beta_0$ $\hat\beta_1$ $\varepsilon$ $y$ $x_1$ $x_2$

Die Parallele zur gewöhnlichen Regression ist stark: Die Schritte (1) und (2) sind Analoga zum Subtrahieren der Mittelwerte in der üblichen Formel. Wenn Sie ein Vektor von Einsen sein lassen, werden Sie in der Tat die übliche Formel wiederherstellen. $x_2$

Diese generalizes in der offensichtlichen Weise zur Regression mit mehr als zwei Variablen: abzuschätzen , Regress und separat gegen alle anderen Variablen, dann bilden sich ihre Residuen gegeneinander an. Zu diesem Zeitpunkt wurde noch keiner der anderen Koeffizienten in der multiplen Regression von geschätzt. $\hat\beta_1$ $y$ $x_1$ $y$

— whuber
quelle

1

— Gute

4

Die gewöhnliche Schätzung der kleinsten Quadrate von ist eine lineare Funktion der Antwortvariablen $\beta$ . Einfach ausgedrückt, die OLS-Schätzung der Koeffizienten, die , kann nur unter Verwendung der abhängigen Variablen ( 's) und der unabhängigen Variablen ( ' s) geschrieben werden. $\beta$ $Y_i$ $X_{ki}$

To explain this fact for a general regression model, you need to understand a little linear algebra. Suppose you would like to estimate the coefficients $(\beta_0, \beta_1, ...,\beta_k)$ in a multiple regression model,

Y_{i} = β_{0} + β_{1} X_{1 i} + . . . + β_{k} X_{k i} + ϵ_{i}

$Y_i = \beta_0+\beta_1X_{1i}+...+\beta_kX_{ki}+\epsilon_i$

$\epsilon_i \overset{iid}{\sim} N(0,\sigma^2)$ $i=1,...,n$ $\mathbf{X}$ is a $n\times k$ matrix where each column contains the $n$ observations of the $k^{th}$ dependent variable $X_k$ . You can find many explanations and derivations here of the formula used to calculate the estimated coefficients $\boldsymbol{\hat{\beta}}=(\hat{\beta}_0, \hat{\beta}_1, ..., \hat{\beta}_k)$ , which is

\hat{β} = (X^{'} X)^{- 1} X^{'} Y

$\boldsymbol{\hat{\beta}}=(\mathbf{X}^\prime \mathbf{X})^{-1}\mathbf{X}^\prime \mathbf{Y}$

assuming that the inverse $(\mathbf{X}^\prime \mathbf{X})^{-1}$ exists. The estimated coefficients are functions of the data, not of the other estimated coefficients.

— caburke
quelle

I have a follow up question, on the simple regression case, you make

y_{i} = β_{0} + β_{1} \bar{x} + β_{1} (x_{i} - \bar{x}) + e_{i}

$y_i=\beta_0+\beta_1\bar x+\beta_1(x_i-\bar x)+e_i$ then

X

$X$ becomes a matrix of

(1, . . ., 1)

$(1,...,1)$ and

(x_{1} - \bar{x}, . . ., x_{n} - \bar{x})

$(x_1-\bar x,...,x_n-\bar x)$ , then follow through the

\hat{β} = (X^{'} X)^{(} - 1) X^{'} Y

$\hat\beta=(X'X)^(-1)X'Y$ . How should I rewrite the equation in my case?

— Saber CN

And 1 more question, does this apply to cases where

x_{1}

$x_1$ and

x_{2}

$x_2$ are not linear, but the model is still linear? For example the decay curve

y = β_{1} e^{x_{1} t} + β_{2} e^{x_{2} t}

$y=\beta_1 e^{x_1t}+\beta_2 e^{x_2t}$ , can I substitute the exponential with

x_{1}^{'}

$x_1'$ and

x_{2}^{'}

$x_2'$ so it becomes my original question?

— Saber CN

In your first comment, you can center the variable (subtract its mean from it) and use that is your independent variable. Search for "standardized regression". The formula you wrote in terms of matrices is not correct. For your second question, yes you may do that, a linear model is one that is linear in

β

$\beta$ , so as long as

y

$y$ equal to a linear combination of

β

$\beta$ 's you are fine.

— caburke

2

(+1). But shouldn't it be "

n \times k

$n \times k$ matrix" instead of

k \times n

$k \times n$ ?

— miura

3

One small minor note on theory vs. practice. Mathematically $\beta_0, \beta_1, \beta_2 ... \beta_n$ can be estimated with the following formula:

\hat{β} = (X^{'} X)^{- 1} X^{'} Y

$\hat{\beta} = (X'X)^{-1} X'Y$

where $X$ is the original input data and $Y$ is the variable that we want to estimate. This follows from minimizing the error. I will proove this before making a small practical point.

Let $e_i$ be the error the linear regression makes at point $i$ . Then:

e_{i} = y_{i} - \hat{y_{i}}

$e_i = y_i - \hat{y_i}$

The total squared error we make is now:

\sum_{i = 1}^{n} e_{i}^{2} = \sum_{i = 1}^{n} (y_{i} - \hat{y_{i}})^{2}

$\sum_{i=1}^n e_i^2 = \sum_{i=1}^n (y_i - \hat{y_i})^2$

Because we have a linear model we know that:

\hat{y_{i}} = β_{0} + β_{1} x_{1, i} + β_{2} x_{2, i} + . . . + β_{n} x_{n, i}

$\hat{y_i} = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + ... + \beta_n x_{n,i}$

Which can be rewritten in matrix notation as:

\hat{Y} = X β

$\hat{Y} = X\beta$

We know that

\sum_{i = 1}^{n} e_{i}^{2} = E^{'} E

$\sum_{i=1}^n e_i^2 = E'E$

We want to minimize the total square error, such that the following expression should be as small as possible

E^{'} E = (Y - \hat{Y})^{'} (Y - \hat{Y})

$E'E = (Y-\hat{Y})' (Y-\hat{Y})$

This is equal to:

E^{'} E = (Y - X β)^{'} (Y - X β)

$E'E = (Y-X\beta)' (Y-X\beta)$

The rewriting might seem confusing but it follows from linear algebra. Notice that the matrices behave similar to variables when we are multiplying them in some regards.

We want to find the values of $\beta$ such that this expression is as small as possible. We will need to differentiate and set the derivative equal to zero. We use the chain rule here.

\frac{d E^{'} E}{d β} = - 2 X^{'} Y + 2 X^{'} X β = 0

$\frac{dE'E}{d\beta} = - 2 X'Y + 2 X'X\beta = 0$

This gives:

X^{'} X β = X^{'} Y

$X'X\beta = X'Y$

Such that finally:

β = (X^{'} X)^{- 1} X^{'} Y

$\beta = (X'X)^{-1} X'Y$

So mathematically we seem to have found a solution. There is one problem though, and that is that $(X'X)^{-1}$ is very hard to calculate if the matrix $X$ is very very large. This might give numerical accuracy issues. Another way to find the optimal values for $\beta$ in this situation is to use a gradient descent type of method. The function that we want to optimize is unbounded and convex so we would also use a gradient method in practice if need be.

— Vincent Warmerdam
quelle

except that you don't actually need to compute

(X^{'} X)^{- 1}

$(X'X)^{-1}$ ...

— user603

valid point. one could also use the gram schmidt process, but I just wanted to remark that finding the optimal values for the

β

$\beta$ vector can also be done numerically because of the convexity.

— Vincent Warmerdam

2

A simple derivation can be done just by using the geometric interpretation of LR.

Linear regression can be interpreted as the projection of $Y$ onto the column space $X$ . Thus, the error, $\hat{\epsilon}$ is orthogonal to the column space of $X$ .

Therefore, the inner product between $X'$ and the error must be 0, i.e.,

$<X', y-X\hat{\beta}> = 0$

$X'y - X'X\hat{\beta} = 0$

$X'y = X'X\hat{\beta}$

Which implies that,

$(X'X)^{-1}X'y = \hat{\beta}$ .

Now the same can be done by:

(1) Projecting $Y$ onto $X_2$ (error $\delta = Y-X_2 \hat{D}$ ), $\hat{D} = (X_2'X_2)^{-1}X_2'y$ ,

(2) Projecting $X_1$ onto $X_2$ (error $\gamma = X_1 - X_2 \hat{G}$ ), $\hat{G} = (X_1'X_1)^{-1}X_1X_2$ ,

and finally,

(3) Projecting $\delta$ onto $\gamma$ , $\hat{\beta}_1$

— Dnaiel
quelle