Analyse
Da dies eine konzeptionelle Frage ist, wollen wir der Einfachheit halber die Situation betrachten, in der ein Konfidenzintervall [ ˉ x ( 1 ) + Z α / 2 s ( 1 ) / √1−αwird für einen Mittelwertμ unterVerwendung einer Zufallsstichprobex(1)der Größenkonstruiertund eine zweite Zufallsstichprobex(2)der Größemwird aus derselben Normalverteilung(μ,σ2)entnommen. (Wenn Sie möchten, können Sie dieZs durch Werte aus der Studentt-Verteilung vonn-1Freiheitsgradenersetzen. Die folgende Analyse wird sich nicht ändern.)
[ x¯( 1 )+ Zα / 2s( 1 )/ n--√, x¯( 1 )+ Z1 - α / 2s( 1 )/ n--√]
μx( 1 )nx( 2 )m( μ , σ2)Ztn - 1
Die Chance, dass der Mittelwert der zweiten Stichprobe innerhalb des vom ersten bestimmten CI liegt, ist
Pr ( x¯( 1 )+ Zα / 2n--√s( 1 )≤x¯(2)≤x¯(1)+Z1−α/2n−−√s(1))=Pr(Zα/2n−−√s(1)≤x¯(2)−x¯(1)≤Z1−α/2n−−√s(1)).
Da der Mittelwert der ersten Stichprobe unabhängig von der Standardabweichung der ersten Stichprobe s ( 1 ) ist (dies erfordert Normalität) und die zweite Stichprobe unabhängig von der ersten Stichprobe ist, bedeutet die Differenz der Stichproben U = ˉ x ( 2 ) - ˉ x ( 1 ) ist unabhängig von s ( 1 ) . Außerdem ist für dieses symmetrische Intervall Z α / 2 = - Z 1 - α / 2x¯(1)s(1)U=x¯(2)−x¯(1)s(1)Zα/2=−Z1−α/2. Wenn man also für die Zufallsvariable s ( 1 ) schreibt und beide Ungleichungen quadriert, ist die fragliche Wahrscheinlichkeit die gleiche wieSs(1)
Pr(U2≤(Z1−α/2n−−√)2S2)=Pr(U2S2≤(Z1−α/2n−−√)2).
Die Erwartungsgesetze implizieren, dass einen Mittelwert von 0 und eine Varianz von hatU0
Var(U)=Var(x¯(2)−x¯(1))=σ2(1m+1n).
Da eine lineare Kombination von Normalvariablen ist, hat es auch eine Normalverteilung. Daher U 2 ist , σ 2 ( 1UU2σ2(1n+1m) times a χ2(1) variable. We already knew that S2 is σ2/n times a χ2(n−1) variable. Consequently, U2/S2 is 1/n+1/m times a variable with an F(1,n−1) distribution. The required probability is given by the F distribution as
F1,n−1(Z21−α/21+n/m).(1)
Discussion
An interesting case is when the second sample is the same size as the first, so that n/m=1 and only n and α determine the probability. Here are the values of (1) plotted against α for n=2,5,20,50.
The graphs rise to a limiting value at each α as n increases. The traditional test size α=0.05 is marked by a vertical gray line. For largish values of n=m, the limiting chance for α=0.05 is around 85%.
By understanding this limit, we will peer past the details of small sample sizes and better understand the crux of the matter. As n=m grows large, the F distribution approaches a χ2(1) distribution. In terms of the standard Normal distribution Φ, the probability (1) then approximates
Φ(Z1−α/22–√)−Φ(Zα/22–√)=1−2Φ(Zα/22–√).
For instance, with α=0.05, Zα/2/2–√≈−1.96/1.41≈−1.386 and Φ(−1.386)≈0.083. Consequently the limiting value attained by the curves at α=0.05 as n increases will be 1−2(0.083)=1−0.166=0.834. You can see it has almost been reached for n=50 (where the chance is 0.8383….)
For small α, the relationship between α and the complementary probability--the risk that the CI does not cover the second mean--is almost perfectly a power law. Another way to express this is that the log complementary probability is almost a linear function of logα. The limiting relationship is approximately
log(2Φ(Zα/22–√))≈−1.79712+0.557203log(20α)+0.00657704(log(20α))2+⋯
In other words, for large n=m and α anywhere near the traditional value of 0.05, (1) will be close to
1−0.166(20α)0.557.
(This reminds me very much of the analysis of overlapping confidence intervals I posted at /stats//a/18259/919. Indeed, the magic power there, 1.91, is very nearly the reciprocal of the magic power here, 0.557. At this point you should be able to re-interpret that analysis in terms of reproducibility of experiments.)
Experimental results
These results are confirmed with a straightforwward simulation. The following R
code returns the frequency of coverage, the chance as computed with (1), and a Z-score to assess how much they differ. The Z-scores are typically less than 2 in size, regardless of n,m,μ,σ,α (or even whether a Z or t CI is computed), indicating the correctness of formula (1).
n <- 3 # First sample size
m <- 2 # Second sample size
sigma <- 2
mu <- -4
alpha <- 0.05
n.sim <- 1e4
#
# Compute the multiplier.
#
Z <- qnorm(alpha/2)
#Z <- qt(alpha/2, df=n-1) # Use this for a Student t C.I. instead.
#
# Draw the first sample and compute the CI as [l.1, u.1].
#
x.1 <- matrix(rnorm(n*n.sim, mu, sigma), nrow=n)
x.1.bar <- colMeans(x.1)
s.1 <- apply(x.1, 2, sd)
l.1 <- x.1.bar + Z * s.1 / sqrt(n)
u.1 <- x.1.bar - Z * s.1 / sqrt(n)
#
# Draw the second sample and compute the mean as x.2.
#
x.2 <- colMeans(matrix(rnorm(m*n.sim, mu, sigma), nrow=m))
#
# Compare the second sample means to the CIs.
#
covers <- l.1 <= x.2 & x.2 <= u.1
#
# Compute the theoretical chance and compare it to the simulated frequency.
#
f <- pf(Z^2 / ((n * (1/n + 1/m))), 1, n-1)
m.covers <- mean(covers)
(c(Simulated=m.covers, Theoretical=f, Z=(m.covers - f)/sd(covers) * sqrt(length(covers))))