Warum ist die asymptotische relative Effizienz des Wilcoxon-Tests

Es ist bekannt, dass die asymptotische relative Effizienz (ARE) des Wilcoxon Signed Rank Test verglichen mit dem Student's t- Test, wenn die Daten aus einer normalverteilten Population stammen. Dies gilt sowohl für den einfachen Test mit einer Stichprobe als auch für die Variante für zwei unabhängige Stichproben (Wilcoxon-Mann-Whitney U). Es ist auch das ARE eines Kruskal-Wallis-Tests im Vergleich zu einem ANOVA F -Test für normale Daten. $\frac{3}{\pi} \approx 0.955$

Hat dieses bemerkenswerte (für mich eines der " unerwartetsten Erscheinungen von $\pi$ ") und bemerkenswert einfache Ergebnis einen aufschlussreichen, bemerkenswerten oder einfachen Beweis?

— Silberfisch
quelle

In Anbetracht der Erscheinung von

π

$\pi$ im normalen CDF, das Auftreten von

π

$\pi$ in der nicht wirklich sollte alles so überraschend. Ich werde eine Antwort riskieren, aber es wird eine Weile dauern, bis eine gute Antwort vorliegt.

— Glen_b -Reinstate Monica

@ Glen_b In der Tat - Ich habe eine Diskussion über "Warum taucht

π

$\pi$ so häufig in Statistiken auf ?" aber

3 / π

$3/\pi$ ist immer noch angenehm überraschend, wenn Sie es zum ersten Mal sehen. Zum Vergleich ist der ARE von Mann-Whitney im Vergleich zum T-Test mit zwei Stichproben 3 bei Exponentialdaten, 1,5 bei Doppelexponential und 1 bei Uniform - viel runder!

— Silverfish

@Silverfish Ich habe die Seite 197 von van der Vaart "Asymptotic Statistics" verlinkt. Für eine Stichprobe haben Vorzeichentests ARE

2 / π

$2/\pi$ relativ zu t-Test.

— Khashaa

@Silverfish ... und bei der Logistik ist es

(π / 3)^{2}

$(\pi/3)^2$ . Es gibt nicht wenige der bekannten AREs (in ein oder zwei Beispielfällen), die

betreffen,

π

$\pi$ und nicht wenige, die einfache Verhältnisse von ganzen Zahlen darstellen.

— Glen_b

Für einen vorzeichenbehafteten Rangtest mit einer Stichprobe scheint es

3 / π

$3/\pi$ . Bei einem Vorzeichentest mit einer Stichprobe beträgt er

2 / π

$2/\pi$ . Also haben wir unsere Position geklärt. Ich halte es für ein gutes Zeichen.

— Khashaa

Antworten:

Kurze Skizze von ARE für einen $t$ Test mit einer Stichprobe , einen signierten Test und einen Test mit signiertem Rang

Ich gehe davon aus, dass die Langversion der Antwort von @ Glen_b eine detaillierte Analyse für einen Rangtest mit zwei Stichproben und eine intuitive Erklärung des ARE enthält. Also werde ich den größten Teil der Ableitung überspringen. (Beispielfall, fehlende Angaben bei Lehmann TSH).

Testaufgabe : Sei $X_1,\ldots,X_n$ eine Zufallsstichprobe aus dem Ortsmodell $f(x-\theta)$ , symmetrisch um Null. Wir müssen ARE von vorzeichenbehaftetem Test und vorzeichenbehaftetem Rangtest für die Hypothese $H_0: \theta=0$ Bezug auf t-Test berechnen .

Um die relative Effizienz von Tests zu beurteilen, werden nur lokale Alternativen berücksichtigt, da konsistente Tests gegen feste Alternativen gegen 1 tendieren. Lokale Alternativen, die zu einer nichttrivialen asymptotischen Kraft führen, haben häufig die Form für festes, wasin manchen LiteraturstellenPitman-Drift genannt wird. $\theta_n=h/\sqrt{n}$ $h$

Unsere vor uns liegende Aufgabe ist

Finden Sie die Grenzverteilung jeder Teststatistik unter der Null
Finden Sie die Grenzverteilung jeder Teststatistik unter der Alternative
Berechnen Sie die lokale asymptotische Kraft jedes Tests

Teststatistik und Asymptotik

t-Test (angesichts der Existenz von ) t n = √t n = √
$t_{n} = \sqrt{n} \frac{\bar{X}}{\hat{σ}} \to_{d} N (0, 1) under the null$

$t_{n} = \sqrt{n} \frac{\bar{X}}{\hat{σ}} \to_{d} N (h / σ, 1) under the alternative θ = h / \sqrt{n}$
- der Test, der ablehnt, wenn hat die asymptotische Potenzfunktion $t_n>z_\alpha$ $1 - Φ (z_{α} - h \frac{1}{σ})$ $1-\Phi\left(z_\alpha-h\frac{1}{\sigma}\right)$
vorzeichenbehafteter Test $S_n=\frac{1}{n}\sum_{i=1}^{n}1\{X_i>0\}$ $\sqrt{n} (S_{n} - \frac{1}{2}) \to_{d} N (0, \frac{1}{4}) under the null$ $\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(0,\frac{1}{4}\right)\quad \text{under the null }$ $\sqrt{n} (S_{n} - \frac{1}{2}) \to_{d} N (h f (0), \frac{1}{4}) under the alternative$ $\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(hf(0),\frac{1}{4}\right)\quad \text{under the alternative }$ and has local asymptotic power $1 - Φ (z_{α} - 2 h f (0))$ $1-\Phi\left(z_\alpha-2hf(0)\right)$
signed-rank test $W_{n} = n^{- 2 / 3} \sum_{i = 1}^{n} R_{i} 1 {X_{i} > 0} \to_{d} N (0, \frac{1}{3}) under the null$ $W_n=n^{-2/3}\sum_{i=1}^{n}R_i1\{X_i>0\}\to_dN\left(0,\frac{1}{3}\right)\quad \text{under the null }$ $W_{n} \to_{d} N (2 h \int f^{2}, \frac{1}{3}) under the alternative$ $W_n\to_dN\left(2h\int f^2,\frac{1}{3}\right)\quad \text{under the alternative }$ and has local asymptotic power $1 - Φ (z_{α} - \sqrt{12} h \int f^{2})$ $1-\Phi\left(z_\alpha-\sqrt{12}h\int f^2\right)$

Therefore,

A R E (S_{n}) = (2 f (0) σ)^{2}

$ARE(S_n)=(2f(0)\sigma)^2$

A R E (W_{n}) = (\sqrt{12} \int f^{2} σ)^{2}

$ARE(W_n)=(\sqrt{12}\int f^2\sigma)^2$ If

f

$f$ is standard normal density,

A R E (S_{n}) = 2 / π

$ARE(S_n)=2/\pi$ ,

A R E (W_{n}) = 3 / π

$ARE(W_n)=3/\pi$

If $f$ is uniform on [-1,1], $ARE(S_n)=1/3$ , $ARE(W_n)=1/3$

Remark on the derivation of distribution under the alternative

There are of course many ways to derive the limiting distribution under the alternative. One general approach is to use Le Cam's third lemma. Simplified version of it states

Let $\Delta_n$ be the log of the likelihood ratio. For some statistic $W_n$ , if
$(W_{n}, Δ_{n}) \to_{d} N [(\begin{matrix} μ \\ - σ^{2} / 2 \end{matrix}), (\begin{array}{cc} σ_{W}^{2} & τ \\ τ & σ^{2} / 2 \end{array})]$ $(W_n,\Delta_n)\to_d N\left[\left(\begin{array}{c} \mu\\ -\sigma^2/2 \end{array}\right),\left(\begin{array}{cc} \sigma^2_W & \tau \\ \tau & \sigma^2/2 \end{array}\right)\right]\\$ under the null, then $W_{n} \to_{d} N (μ + τ, σ_{W}^{2}) under the alternative$ $W_n\to_d N\left(\mu+\tau,\sigma^2_W\right)\quad\text{under the alternative}$

For quadratic mean differentiable densities, local asymptotic normality and contiguity are automatically satisfied, which in turn implies Le Cam lemma. Using this lemma, we only need to compute $\mathrm{cov}(W_n,\Delta_n)$ under the null. $\Delta_n$ obeys LAN

Δ_{n} \approx \frac{h}{\sqrt{n}} \sum_{i = 1}^{n} l (X_{i}) - \frac{1}{2} h^{2} I_{0}

$\Delta_n\approx \frac{h}{\sqrt{n}}\sum_{i=1}^{n}l(X_i)-\frac{1}{2}h^2I_0$ where

l

$l$ is score function,

I_{0}

$I_0$ is information matrix. Then, for instance, for signed test

S_{n}

$S_n$

c o v (\sqrt{n} (S_{n} - 1 / 2), Δ_{n}) = - h c o v (1 {X_{i} > 0}, \frac{f^{'}}{f} (X_{i})) = h \int_{0}^{\infty} f^{'} = h f (0)

$\mathrm{cov}(\sqrt{n}(S_n-1/2),\Delta_n)=-h\mathrm{cov}\left(1\{X_i>0\},\frac{f'}{f}(X_i)\right)=h\int_0^\infty f'=hf(0)$

— Khashaa
quelle

+1 I wasn't going to go into quite this much detail (indeed, with your answer covering things quite nicely already, I probably won't add anything to what I have now) so if you want to put more detail, don't hold back on my account. I would have been several days yet (and still for less than you have already), so it's a good thing you came in.

— Glen_b -Reinstate Monica

This is a nice answer particularly for adding in Le Cam's lemma (+1). It seems to me there is quite a big jump between establishing the asymptotics in 1, 2, and 3, and the "therefore" bit where you write the AREs. I think if I were writing this up, I'd define asymptotic efficiency at this point (or maybe earlier, so the upshot of points 1, 2 and 3 would be the AEs not just local asymptotic powers in each case) and then the step to the AREs would be much easier for future readers to follow.

— Silverfish

Perhaps it is worth specifying your

H_{1}

$H_1$ ? One-sided and two-sided cases have different-looking asymptotic powers (though they lead to the same AREs).

— Silverfish

Feel free to edit my answer or append it to the OP.

— Khashaa

@Khashaa Thanks. I shall edit your post when I have the right stuff in front of me. Would you mind clarifying the meaning of the

*

$*$ in the final equation?

— Silverfish

This has nothing to do with explaining why $\pi$ appears (which was explained nicely by others) but may help intuitively. The Wilcoxon test is a $t$ -test on the ranks of $Y$ whereas the parametric test is computed on the raw data. The efficiency of the Wilcoxon test with respect to the $t$ -test is the square of the correlation between the scores used for the two tests. As $n\rightarrow \infty$ the squared correlation converges to $\frac{\pi}{3}$ . You can easily see this empirically using R:

n <- 1000000; x <- qnorm((1:n)/(n+1)); cor(1:n, x)^2; 3/pi
[1] 0.9549402
[1] 0.9549297
n <- 100000000; x <- qnorm((1:n)/(n+1)); cor(1:n, x)^2; 3/pi
[1] 0.9549298
[1] 0.9549297

— Frank Harrell
quelle

This is indeed a very helpful comment. Is it slightly conceptually closer to do n <- 1e6; x <- rnorm(n); cor(x, rank(x))^2 (which obviously produces the same result)?

— Silverfish

(People intrigued by Frank's comment may want to look at this question about the equivalence of Wilcoxon-Mann-Whitney U and a t-test on the ranks.)

— Silverfish

something I don't understand about this answer is that the correlation is higher for lower values of

n

$n$ (I think the proximal reason is that we don't see the tails very well for smaller

n

$n$ ). Naively that implies that the relative efficiency of the Wilcoxon is higher for small

n

$n$ , which surprises me ... ?? (I might do some simulations, but (a) if there's an easy answer ... and (b) am I missing a conceptual point somewhere?)

— Ben Bolker

To my recollection the small sample efficiency of both the Wilcoxon signed rank test and the W-M-W are a bit lower than the asymptotic value on shift alternatives at the normal distribution.

— Glen_b -Reinstate Monica

Short version: The basic reason with the Wilcoxon-Mann-Whitney under a shift alternative is that finding the asymptotic relative efficiency (WMW/t) corresponds to evaluating $12\sigma^2[\int f^2(x) dx]^2$ where $f$ is the common density at the null and $\sigma$ is the common variance.

So at the normal, $f^2$ is effectively a scaled version of $f$ ; its integral will have a $\frac{1}{\sqrt{\pi}}$ term; when squared, that's the source of the $\frac{ \;}{\pi}$ .

The same term - with the same integral - is involved in the ARE for the signed rank test, so it takes the same value.

For the sign test relative to t, the ARE is $4\sigma^2f(0)^2$ ... and $f(0)^2$ again has a $\frac{ \;}{\pi}$ in it.

So essentially it's as I said in comments; $\pi$ is in the ARE for the Wilcoxon-Mann-Whitney vs the two-sample t test, for the Wilcoxon signed rank test vs the one-sample t and the sign test vs the one-sample t test (in each case at the normal) quite literally because it appears in the normal density.

Reference:

J. L. Hodges and E. L. Lehmann (1956),
"The Efficiency of Some Nonparametric Competitors of the t-Test",
Ann. Math. Statist., 27:2, 324-335.

— Glen_b -Reinstate Monica
quelle

I like the explanation for the intuition for the appearance of

π

$\pi$ in the denominator; is it essentially coincidence that the Renyi entropy turns up in the WMW/Wilcoxon integrals?

— Silverfish

@Silverfish That

\int f^{2} d x

$\int f^2 dx$ turns up is certainly not coincidence. However, that's not because that's connected to Rényi entropy, or at least I don't see any direct connection. We're getting into stuff I don't really know about now, though.

— Glen_b -Reinstate Monica

@Silverfish It's only a Renyi entropy for

α = 2

$\alpha=2$ . Otherwise, it is just a plain old square that can come up in a million different ways.

— abalter