Warum wird eine T-Verteilung zum Testen eines linearen Regressionskoeffizienten verwendet?

16

In der Praxis ist die Verwendung eines Standard-T-Tests zur Überprüfung der Signifikanz eines linearen Regressionskoeffizienten gängige Praxis. Die Mechanik der Berechnung macht für mich Sinn.

Warum kann die T-Verteilung verwendet werden, um die Standardteststatistik zu modellieren, die beim Testen von linearen Regressionshypothesen verwendet wird? Standardteststatistik, auf die ich mich hier beziehe:

T_{0} = \frac{\hat{β} - β_{0}}{S E (\hat{β})}

$T_{0} = \frac{\widehat{\beta} - \beta_{0}}{SE(\widehat{\beta})}$

— Nate Parke
quelle

A full and complete answer to this question will be quite long, I'm sure. So while you wait for someone to tackle this, you can get a pretty good idea of why this is the case by looking at some notes I found online here: onlinecourses.science.psu.edu/stat501/node/297. Note specifically that

t_{(n - p)}^{2} = F_{(1, n - p)}

$t^2_{(n−p)}=F_{(1,n−p)}$ .

— StatsStudent

1

I cannot believe this is not a duplicate, and yet all the upvotes (both on the question and the answers)... What about this? Or perhaps it is not a duplicate, which means there are (or there was until today) super-basic topics still that have not been covered over the nearly seven years of existence of Cross Validated... Wow...

— Richard Hardy

@RichardHardy Hmm, that sounds like a duplicate. While it's more verbose, the question is specifically: "How can I prove that for $\hat\beta_i$ , $\frac{\hat{\beta}_i - \beta_i} {s_{\hat{\beta}_i}} \sim t_{n-k}$ "

— Firebug

25

To understand why we use the t-distribution, you need to know what is the underlying distribution of $\widehat{\beta}$ and of the Residual sum of squares ( $RSS$ ) as these two put together will give you the t-distribution.

The easier part is the distribution of $\widehat{\beta}$ which is a normal distribution - to see this note that $\widehat{\beta}$ = $(X^{T}X)^{-1}X^{T}Y$ so it is a linear function of $Y$ where $Y\sim N(X\beta, \sigma^{2}I_{n})$ . As a result it is also normally distributed, $\widehat{\beta} \sim N(\beta, \sigma^{2}(X^{T}X)^{-1})$ - let me know if you need help deriving the distribution of $\widehat{\beta}$ .

Additionally, $RSS \sim \sigma^{2}\chi^{2}_{n-p}$ , where $n$ is the number of observations and $p$ is the number of parameters used in your regression. The proof of this is a bit more involved, but also straightforward to derive (see proof here Why is RSS distributed chi square times n-p?).

Up until this point I have considered everything in matrix/vector notation, but let's for simplicity use $\widehat{\beta}_{i}$ and use its normal distribution which will give us:

\frac{{\hat{β}}_{i} - β_{i}}{σ \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim N (0, 1)

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{\sigma\sqrt{(X^{T}X)^{-1}_{ii}}} \sim N(0,1) \end{equation}$

Additionally, from the chi-squared distribution of $RSS$ we have that:

\frac{(n - p) s^{2}}{σ^{2}} \sim χ_{n - p}^{2}

$\begin{equation} \frac{(n-p)s^{2}}{\sigma^{2}} \sim \chi^{2}_{n-p} \end{equation}$

This was simply a rearrangement of the first chi-squared expression and is independent of the $N(0,1)$ . Additionally, we define $s^{2}=\frac{RSS}{n-p}$ , which is an unbiased estimator for $\sigma^{2}$ . By the definition of the $t_{n-p}$ definition that dividing a normal distribution by an independent chi-squared (over its degrees of freedom) gives you a t-distribution (for the proof see: A normal divided by the $\sqrt{\chi^2(s)/s}$ gives you a t-distribution -- proof) you get that:

\frac{{\hat{β}}_{i} - β_{i}}{s \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim t_{n - p}

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{s\sqrt{(X^{T}X)^{-1}_{ii}}} \sim t_{n-p} \end{equation}$

Where $s\sqrt{(X^{T}X)^{-1}_{ii}}=SE(\widehat{\beta}_{i})$ .

Let me know if it makes sense.

— francium87d
quelle

what a great answer! could you please explain why

\frac{{\hat{β}}_{i} - β_{i}}{σ \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim N (0, 1)

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{\sigma\sqrt{(X^{T}X)^{-1}_{ii}}} \sim N(0,1) \end{equation}$ ?

— KingDingeling

4

The answer is actually very simple: you use t-distribution because it was pretty much designed specifically for this purpose.

Ok, the nuance here is that it wasn't designed specifically for the linear regression. Gosset came up with distribution of sample that was drawn from the population. For instance, you draw a sample $x_1,x_2,\dots,x_n$ , and calculate its mean $\bar x=\sum_{i=1}^n x_i/n$ . What is the distribution of a sample mean $\bar x$ ?

If you knew the true (population) standard deviation $\sigma$ , then you'd say that the variable $\xi=(\bar x-\mu)\sqrt n/\sigma$ is from the standard normal distribution $\mathcal N(0,1)$ . The trouble's that you usually do not know $\sigma$ , and can only estimate it $\hat\sigma$ . So, Gosset figured out the distribution when you substitute $\sigma$ with $\hat\sigma$ in the denominator, and the distribution is now called after his pseduonym "Student t".

The technicalities of linear regression lead to a situation where we can estimate the standard error $\hat\sigma_\beta$ of the coefficient estimate $\hat\beta$ , but we do not know the true $\sigma$ , therefore Student t distribution is applied here too.

— Aksakal
quelle