Auf welcher Ebene ist ein

HINTERGRUND: Sicher überspringen - dient als Referenz und zur Rechtfertigung der Frage.

Die Eröffnung dieses Papiers lautet:

"Karl Pearsons berühmter Chi-Quadrat-Kontingenztest leitet sich aus einer anderen Statistik ab, die als z-Statistik bezeichnet wird und auf der Normalverteilung basiert. Die einfachsten Versionen von $\chi^2$ können mathematisch mit äquivalenten z-Tests identisch sein. Die Tests liefern dasselbe Ergebnis unter allen Umständen. In jeder Hinsicht könnte "Chi-Quadrat" als "Z-Quadrat" bezeichnet werden. Die kritischen Werte von $\chi^2$ für einen Freiheitsgrad sind das Quadrat der entsprechenden kritischen Werte von z. "

Dies wurde mehrfach im Lebenslauf behauptet ( hier , hier , hier und andere).

Und tatsächlich können wir beweisen, dass ist äquivalent zumit: $\chi^2_{1\,df}$ $X^2$ $X\sim N(0,1)$

Nehmen wir an, dass und und ermitteln Sie die Dichte von mit der -Methode: $X \sim N(0,1)$ $Y=X^2$ $Y$ $cdf$

. Das Problem ist, dass wir die Dichte der Normalverteilung nicht in enger Form integrieren können. Aber wir können es ausdrücken: $p(Y \leq y) = p(X^2 \leq y)= p(-\sqrt{y} \leq x \leq \sqrt{y})$

Ableitung nehmen:

F_{X} (y) = F_{X} (\sqrt{y}) - F_{X} (- \sqrt{y}) .

$F_X(y) = F_X(\sqrt{y})- F_X(-\sqrt[]{y}).$

f_{X} (y) = F_{X}^{'} (\sqrt{y}) \frac{1}{2 \sqrt{y}} + F_{X}^{'} (\sqrt{- y}) \frac{1}{2 \sqrt{y}} .

$f_X(y)= F_X'(\sqrt{y})\,\frac{1}{2\sqrt{y}}+ F_X'(\sqrt{-y})\,\frac{1}{2\sqrt{y}}.$

Da die Werte der Normalen symmetrisch sind: $pdf$

. Gleichzusetzen mit dem $f_X(y)= F_X'(\sqrt{y})\,\frac{1}{\sqrt{y}}$ $pdf$ der normalen (jetzt in dem wird $x$ $pdf$ $\sqrt{y}$ an das $e^{-\frac{x^2}{2}}$ Teil der Normalen ); und daran erinnern, in $pdf$ am Ende: $\frac{1}{\sqrt{y}}$

f_{X} (y) = F_{X}^{'} (\sqrt{y}) \frac{1}{\sqrt{y}} = \frac{1}{\sqrt{2 π}} e^{- \frac{y}{2}} \frac{1}{\sqrt{y}} = \frac{1}{\sqrt{2 π}} e^{- \frac{y}{2}} y^{\frac{1}{2} - 1}

$f_X(y)= F_X'(\sqrt[]{y})\,\frac{1}{\sqrt[]{y}}= \frac{1}{\sqrt{2\pi}}\,e^{-\frac{y}{2}}\, \frac{1}{\sqrt[]{y}}=\frac{1}{\sqrt{2\pi}}\,e^{-\frac{y}{2}}\, y^{\frac{1}{2}- 1}$

Vergleichen Sie mit dem PDF des Chi-Quadrats:

f_{X} (x) = \frac{1}{2^{ν / 2} Γ (\frac{ν}{2})} e^{\frac{- x}{2}} x^{\frac{ν}{2} - 1}

$f_X(x)= \frac{1}{2^{\nu/2}\Gamma(\frac{\nu}{2})}e^{\frac{-x}{2}}x^{\frac{\nu}{2}-1}$

Da , fürdf haben wir genau dasdes Chi-Quadrats abgeleitet. $\Gamma(1/2)=\sqrt{\pi}$ $1$ $pdf$

Wenn wir die Funktion prop.test()in R aufrufen , rufen wir den gleichen -Test auf, als ob wir uns entscheiden würden . $\chi^2$ chisq.test()

DIE FRAGE:

Ich verstehe also alle diese Punkte, weiß aber aus zwei Gründen noch nicht, wie sie auf die tatsächliche Implementierung dieser beiden Tests angewendet werden:

Ein Z-Test ist nicht quadriert.
Die tatsächlichen Teststatistiken sind völlig anders:

Der Wert der Teststatistik für a $\chi^2$ ist:

wo $\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} = N \sum_{i=1}^n p_i \left(\frac{O_i/N - p_i}{p_i}\right)^2$

= Pearsons kumulative Teststatistik, die sich asymptotisch einer Verteilungnähert. = Anzahl der Beobachtungen vom Typ ; = Gesamtzahl der Beobachtungen; = = die erwartete (theoretische) Häufigkeit von Typ , die durch die Nullhypothese bestätigt wird, dass der Anteil von Typ in der Grundgesamtheit ; = die Anzahl der Zellen in der Tabelle. $\chi^2$ $\chi^2$ $O_i$ $i$ $N$ $E_i$ $N p_i$ $i$ $i$ $p_i$ $n$

Andererseits ist die Teststatistik für einen Test : $z$

mit $\displaystyle Z = \frac{\frac{x_1}{n_1}-\frac{x_2}{n_2}}{\sqrt{p\,(1-p)(1/n_1+1/n_2)}}$ , wobeiunddie Anzahl der "Erfolge" sind, über die Anzahl der Probanden in jeder der Ebenen der kategorialen Variablen, dhund. $\displaystyle p = \frac{x_1\,+\,x_2}{n_1\,+\,n_2}$ $x_1$ $x_2$ $n_1$ $n_2$

Diese Formel scheint auf der Binomialverteilung zu beruhen.

Diese beiden Teststatistiken unterscheiden sich deutlich und führen zu unterschiedlichen Ergebnissen für die eigentliche Teststatistik sowie für die p- Werte : 5.8481für den und für den z-Test mit (danke, @ mark999 ). Der p- Wert für den -Test ist , während für den z-Test ist . Der Unterschied erklärt sich durch gegenüber : (danke @amoeba). $\chi^2$ 2.4183 $\small 2.4183^2=5.84817$ $\chi^2$ 0.015590.0077 $\small 0.01559/2=0.007795$

Auf welcher Ebene sagen wir also, dass sie ein und dasselbe sind?

chi-squared proportion z-test

— Antoni Parellada
quelle

But these are two identical tests. Z squared is the chi-square statistic. Let you have 2x2 frequency table where columns are the two groups and the rows are "success" and "failure". Then the so called expected frequencies of the chi-square test in a given column is the weighted (by the groups' N) average column (group) profile multiplied by that group's N. Thus, it comes that chi-square tests the deviation of each of the two groups profiles from this average group profile, - which is equivalent to testing the groups' profiles difference from each other, the z-test of proportions.

— ttnphns

In the example on the last hyperlink the

χ^{2}

$\chi^2$ is almost the square of the z-test statistic, but not quite, and the p-values are different. Also, when you look at the formulas for the rest statistics above, is it truly immediate that they are identical? Or even one the square of the other?

— Antoni Parellada

In chisq.test(), have you tried using correct=FALSE?

— mark999

Indeed, Antoni. Both tests exist with or without the Yates. Could it be that you compute one with but the other without it?

— ttnphns

Thank you! You were (predictably) correct. With the Yates correction off, one is just the square of the other. I edited the question accordingly, although a bit fast. I still would like to prove algebraically that both test statistics are the same (or one the square of the other), and understand why the p-values are different.

— Antoni Parellada

Let us have a 2x2 frequency table where columns are two groups of respondents and rows are the two responses "Yes" and "No". And we've turned the frequencies into the proportions within group, i.e. into the vertical profiles:

      Gr1   Gr2  Total
Yes   p1    p2     p
No    q1    q2     q
      --------------
     100%  100%   100%
      n1    n2     N

The usual (not Yates corrected) $\chi^2$ of this table, after you substitute proportions instead of frequencies in its formula, looks like this:

n_{1} [\frac{(p_{1} - p)^{2}}{p} + \frac{(q_{1} - q)^{2}}{q}] + n_{2} [\frac{(p_{2} - p)^{2}}{p} + \frac{(q_{2} - q)^{2}}{q}] = \frac{n_{1} (p_{1} - p)^{2} + n_{2} (p_{2} - p)^{2}}{p q} .

$n_1[\frac{(p_1-p)^2}{p}+\frac{(q_1-q)^2}{q}]+n_2[\frac{(p_2-p)^2}{p}+\frac{(q_2-q)^2}{q}]= \frac{n_1(p_1-p)^2+n_2(p_2-p)^2}{pq}.$

Remember that $p= \frac{n_1p_1+n_2p_2}{n_1+n_2}$ , the element of the weighted average profile of the two profiles (p1,q1) and (p2,q2), and plug it in the formula, to obtain

. . . = \frac{(p_{1} - p_{2})^{2} (n_{1}^{2} n_{2} + n_{1} n_{2}^{2})}{p q N^{2}}

$...= \frac{(p_1-p_2)^2(n_1^2n_2+n_1n_2^2)}{pqN^2}$

Divide both numerator and denominator by the $(n_1^2n_2+n_1n_2^2)$ and get

\frac{(p_{1} - p_{2})^{2}}{p q (1 / n_{1} + 1 / n_{2})} = Z^{2},

$\frac{(p_1-p_2)^2}{pq(1/n_1+1/n_2)}=Z^2,$

the squared z-statistic of the z-test of proportions for "Yes" response.

Thus, the 2x2 homogeneity Chi-square statistic (and test) is equivalent to the z-test of two proportions. The so called expected frequencies computed in the chi-square test in a given column is the weighted (by the group n) average vertical profile (i.e. the profile of the "average group") multiplied by that group's n. Thus, it comes out that chi-square tests the deviation of each of the two groups profiles from this average group profile, - which is equivalent to testing the groups' profiles difference from each other, which is the z-test of proportions.

This is one demonstration of a link between a variables association measure (chi-square) and a group difference measure (z-test statistic). Attribute associations and group differences are (often) the two facets of the same thing.

(Showing the expansion in the first line above, By @Antoni's request):

$n_1[\frac{(p_1-p)^2}{p}+\frac{(q_1-q)^2}{q}]+n_2[\frac{(p_2-p)^2}{p}+\frac{(q_2-q)^2}{q}] = \frac{n_1(p_1-p)^2q}{pq}+\frac{n_1(q_1-q)^2p}{pq}+\frac{n_2(p_2-p)^2q}{pq}+\frac{n_2(q_2-q)^2p}{pq} = \frac{n_1(p_1-p)^2(1-p)+n_1(1-p_1-1+p)^2p+n_2(p_2-p)^2(1-p)+n_2(1-p_2-1+p)^2p}{pq} = \frac{n_1(p_1-p)^2(1-p)+n_1(p-p_1)^2p+n_2(p_2-p)^2(1-p)+n_2(p-p_2)^2p}{pq} = \frac{[n_1(p_1-p)^2][(1-p)+p]+[n_2(p_2-p)^2][(1-p)+p]}{pq} = \frac{n_1(p_1-p)^2+n_2(p_2-p)^2}{pq}.$

— ttnphns
quelle

@ttnphs This is great! Any chance you could clarify the intermediate step in the first equation (

χ^{2}

$\chi^2$ ) formula - I don't see how the

q

$q$ 's go away after the equal sign.

— Antoni Parellada

@ttnphs When I expand it I get

n_{1} [\frac{(p_{1} - p)^{2}}{p} + \frac{(q_{1} - q)^{2}}{q}] + n_{2} [\frac{(p_{2} - p)^{2}}{p} + \frac{(q_{2} - q)^{2}}{q}] = n_{1} (\frac{q (p^{2} + p (- 2 p_{1} - 2 q_{1} + p_{1}^{2}) + p (q^{2} + q_{1}^{2})}{p q}) + n_{2} (\frac{q (p^{2} + p (- 2 p_{2} - 2 q_{2}) + p_{2}^{2}) + p (q^{2} + q_{2}^{2})}{p q})

$n_1[\frac{(p_1-p)^2}{p}+\frac{(q_1-q)^2}{q}]+n_2[\frac{(p_2-p)^2}{p}+\frac{(q_2-q)^2}{q}]=n_1(\frac{q(p^2+p(-2p_1-2q_1+p_1^2)+p(q^2+q_1^2)}{pq})+n_2(\frac{q(p^2+p(-2p_2-2q_2)+p_2^2)+p(q^2+q_2^2)}{pq})$

— Antoni Parellada

@ttnphs ... Or some reference so it's less work to type the latex... And I'll promptly and happily 'accept' the answer...

— Antoni Parellada

@Antoni, expansion inserted.

— TTNPHNS

@ttnphns Awesome!

— Antoni Parellada