Dies zeigt, dass 100 Messungen für 5 Probanden weniger Informationen liefern als 5 Messungen für 100 Probanden

Auf einer Konferenz habe ich die folgende Aussage gehört:

100 Messungen für 5 Probanden liefern viel weniger Informationen als 5 Messungen für 100 Probanden.

Es ist ein bisschen offensichtlich, dass dies wahr ist, aber ich habe mich gefragt, wie man es mathematisch beweisen könnte ... Ich denke, ein lineares gemischtes Modell könnte verwendet werden. Allerdings weiß ich nicht viel über die Mathematik, mit der sie geschätzt wurden (ich arbeite nur lmer4für LMMs und bmrsGLMMs :) Können Sie mir ein Beispiel zeigen, wo dies zutrifft? Ich würde eine Antwort mit einigen Formeln vorziehen, als nur einem Code in R. Nehmen Sie eine einfache Einstellung an, wie zum Beispiel ein lineares gemischtes Modell mit normalverteilten zufälligen Abschnitten und Steigungen.

PS Eine mathematische Antwort, die keine LMMs beinhaltet, wäre auch in Ordnung. Ich dachte an LMMs, weil sie für mich das natürliche Werkzeug waren, um zu erklären, warum weniger Kennzahlen von mehr Probanden besser sind als mehr Kennzahlen von wenigen Probanden, aber ich kann mich durchaus irren.

— DeltaIV
quelle

+1. Ich denke, die einfachste Einstellung wäre, eine Aufgabe zur Schätzung des Populationsmittelwerts

μ

$\mu$ in Betracht zu ziehen, bei der jedes Subjekt seinen eigenen Mittelwert

a \sim N (μ, σ_{a}^{2})

$a \sim \mathcal N(\mu, \sigma_a^2)$ und jede Messung dieses Subjekts als

x \sim N (a, σ^{2})

$x \sim \mathcal N(a, \sigma^2)$ . Wenn wir

n

$n$ von jedem der Messungen

m

$m$ Probanden, was ist dann der optimale Weg zum Set

n

$n$ und

m

$m$ gegeben konstanten Produkt

n m = N

$nm=N$ .

— Amöbe sagt Reinstate Monica

"Optimal" im Sinne einer Minimierung der Varianz des Stichprobenmittels der

erfassten Datenpunkte.

N

$N$

— Amöbe sagt Reinstate Monica

Ja. Bei Ihrer Frage müssen wir uns jedoch nicht darum kümmern, wie die Abweichungen geschätzt werden. Ihre Frage (dh das Zitat in Ihrer Frage) ist meines Erachtens nur die Schätzung des globalen Mittelwerts

und es scheint offensichtlich, dass der beste Schätzer durch den Mittelwert

aller

Punkte in der Stichprobe gegeben ist. Die Frage lautet dann: Was ist bei

und

die Varianz von

? Wenn wir das wissen, können wir es in Bezug auf

minimieren, wenn

μ

$\mu$

\bar{x}

$\bar x$

N = n m

$N=nm$

μ

$\mu$

σ^{2}

$\sigma^2$

σ_{a}^{2}

$\sigma^2_a$

n

$n$

m

$m$

\bar{x}

$\bar x$

n

$n$

Einschränkung.

n m = N

$nm=N$

— Amöbe sagt Reinstate Monica

Ich weiß nicht, wie ich irgendetwas davon ableiten kann, aber ich stimme zu, dass es offensichtlich erscheint: Um die Fehlervarianz abzuschätzen, ist es am besten, alle

Messungen von einem einzigen Subjekt zu haben; und um die Subjektvarianz abzuschätzen, wäre es (wahrscheinlich?) am besten,

verschiedene Subjekte mit jeweils 1 Messung zu haben. Es ist zwar nicht so klar über den Mittelwert, aber meine Intuition sagt mir, dass es auch am besten wäre ,

Probanden mit je 1 Messung zu haben. Ich frage mich, ob das stimmt ...

N

$N$

N

$N$

N

$N$

— Amöbe sagt Reinstate Monica

Vielleicht in etwa so: Die Varianz der Stichprobenmittelwerte pro Subjekt sollte

, wobei der erste Term die Subjektvarianz und der zweite die Varianz der Schätzung des Mittelwerts jedes Subjekts ist. Dann ist die Varianz des Mittelwerts (dh des Gesamtmittelwerts) der Überobjekte

σ_{a}^{2} + σ^{2} / n

$\sigma^2_a + \sigma^2/n$

was minimiert wird, wenn

(σ_{a}^{2} + σ^{2} / n) / m = σ_{a}^{2} / m + σ^{2} / (n m) = σ_{a}^{2} / m + σ^{2} / N = σ_{a}^{2} / m + c o n s t,

$(\sigma^2_a + \sigma^2/n)/m = \sigma^2_a/m + \sigma^2/(nm) = \sigma^2_a/m + \sigma^2/N = \sigma^2_a/m + \mathrm{const},$

m = N

$m=N$

— Amöbe sagt Reinstate Monica

Die kurze Antwort ist, dass Ihre Vermutung wahr ist, wenn und nur wenn es eine positive Korrelation zwischen den Klassen in den Daten gibt . Empirisch gesehen weisen die meisten Cluster-Datensätze die meiste Zeit eine positive Korrelation innerhalb der Klasse auf, was bedeutet, dass Ihre Vermutung in der Praxis normalerweise wahr ist. Wenn die klasseninterne Korrelation jedoch 0 ist, sind die beiden von Ihnen genannten Fälle gleichermaßen informativ. Und wenn die klasseninterne Korrelation negativ ist , ist es weniger aussagekräftig , weniger Messungen an mehr Probanden durchzuführen. Wir würden es eigentlich vorziehen (was die Verringerung der Varianz der Parameterschätzung betrifft), alle unsere Messungen an einem einzigen Objekt durchzuführen.

Statistisch gesehen gibt es zwei Perspektiven , aus denen wir darüber nachdenken können: ein Zufallseffekt (oder gemischt ) Modell , das Sie in Ihrer Frage erwähnen, oder ein Randmodell , das hier ein bisschen mehr informativ landet.

Modell mit zufälligen Effekten (gemischt)

Angenommen, wir haben eine Gruppe von Probanden, von denen wir jeweils Messungen vorgenommen haben. Dann wird ein einfaches Zufallseffekt - Modell der - ten Messung vom könnte tH unterliegen wobei die feste intercept ist, ist der Zufall Subjekt - Effekt (mit der Varianz ), ist der Beobachtungsebenenfehlerterm (mit Varianz $n$ $m$ $j$ $i$

y_{i j} = β + u_{i} + e_{i j},

$y_{ij} = \beta + u_i + e_{ij},$

β

$\beta$

u_{i}

$u_i$

σ_{u}^{2}

$\sigma^2_u$

e_{i j}

$e_{ij}$

σ_{e}^{2}

$\sigma^2_e$ ), und die letzten beiden zufälligen Terme sind unabhängig.

In diesem Modell stellt den Populationsmittelwert dar, und bei einem ausgeglichenen Datensatz (dh einer gleichen Anzahl von Messungen von jedem Subjekt) ist unsere beste Schätzung einfach der Stichprobenmittelwert. Wenn wir also "mehr Informationen" als kleinere Varianz für diese Schätzung ansehen, möchten wir im Grunde wissen, wie die Varianz des Stichprobenmittelwerts von und abhängt . Mit ein bisschen Algebra können wir dieses $\beta$ $n$ $m$

\begin{aligned} var (\frac{1}{n m} \sum_{i} \sum_{j} y_{i j}) & = var (\frac{1}{n m} \sum_{i} \sum_{j} β + u_{i} + e_{i j}) \\ = \frac{1}{n^{2} m^{2}} var (\sum_{i} \sum_{j} u_{i} + \sum_{i} \sum_{j} e_{i j}) \\ = \frac{1}{n^{2} m^{2}} (m^{2} \sum_{i} var (u_{i}) + \sum_{i} \sum_{j} var (e_{i j})) \\ = \frac{1}{n^{2} m^{2}} (n m^{2} σ_{u}^{2} + n m σ_{e}^{2}) \\ = \frac{σ_{u}^{2}}{n} + \frac{σ_{e}^{2}}{n m} . \end{aligned}

$\begin{aligned} \text{var}(\frac{1}{nm}\sum_i\sum_jy_{ij}) &= \text{var}(\frac{1}{nm}\sum_i\sum_j\beta + u_i + e_{ij}) \\ &= \frac{1}{n^2m^2}\text{var}(\sum_i\sum_ju_i + \sum_i\sum_je_{ij}) \\ &= \frac{1}{n^2m^2}\Big(m^2\sum_i\text{var}(u_i) + \sum_i\sum_j\text{var}(e_{ij})\Big) \\ &= \frac{1}{n^2m^2}(nm^2\sigma^2_u + nm\sigma^2_e) \\ &= \frac{\sigma^2_u}{n} + \frac{\sigma^2_e}{nm}. \end{aligned}$

σ_{u}^{2} > 0

$\sigma^2_u>0$

n

$n$

m

$m$

$m$ $n$ $nm$

\frac{σ_{u}^{2}}{n} + constant,

$\frac{\sigma^2_u}{n} + \text{constant},$

n

$n$ is as large as possible (up to a maximum of

n = n m

$n=nm$ , in which case

m = 1

$m=1$ , meaning we take a single measurement from each subject).

My short answer referred to the intra-class correlation, so where does that fit in? In this simple random-effects model the intra-class correlation is

ρ = \frac{σ_{u}^{2}}{σ_{u}^{2} + σ_{e}^{2}}

$\rho = \frac{\sigma^2_u}{\sigma^2_u + \sigma^2_e}$ (sketch of a derivation here). So we can write the variance equation above as

var (\frac{1}{n m} \sum_{i} \sum_{j} y_{i j}) = \frac{σ_{u}^{2}}{n} + \frac{σ_{e}^{2}}{n m} = (\frac{ρ}{n} + \frac{1 - ρ}{n m}) (σ_{u}^{2} + σ_{e}^{2})

$\text{var}(\frac{1}{nm}\sum_i\sum_jy_{ij}) = \frac{\sigma^2_u}{n} + \frac{\sigma^2_e}{nm} = \Big(\frac{\rho}{n} + \frac{1-\rho}{nm}\Big)(\sigma^2_u+\sigma^2_e)$ This doesn't really add any insight to what we already saw above, but it does make us wonder: since the intra-class correlation is a bona fide correlation coefficient, and correlation coefficients can be negative, what would happen (and what would it mean) if the intra-class correlation were negative?

In the context of the random-effects model, a negative intra-class correlation doesn't really make sense, because it implies that the subject variance $\sigma^2_u$ is somehow negative (as we can see from the $\rho$ equation above, and as explained here and here)... but variances can't be negative! But this doesn't mean that the concept of a negative intra-class correlation doesn't make sense; it just means that the random-effects model doesn't have any way to express this concept, which is a failure of the model, not of the concept. To express this concept adequately we need to consider the marginal model.

Marginal model

For this same dataset we could consider a so-called marginal model of $y_{ij}$ ,

y_{i j} = β + e_{i j}^{*},

$y_{ij} = \beta + e^*_{ij},$ where basically we've pushed the random subject effect

u_{i}

$u_i$ from before into the error term

e_{i j}

$e_{ij}$ so that we have

e_{i j}^{*} = u_{i} + e_{i j}

$e^*_{ij} = u_i + e_{ij}$ . In the random-effects model we considered the two random terms

u_{i}

$u_i$ and

e_{i j}

$e_{ij}$ to be i.i.d., but in the marginal model we instead consider

e_{i j}^{*}

$e^*_{ij}$ to follow a block-diagonal covariance matrix

C

$\textbf{C}$ like

C = σ^{2} [\begin{matrix} R & 0 & \dots & 0 \\ 0 & R & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & R \end{matrix}], R = [\begin{matrix} 1 & ρ & \dots & ρ \\ ρ & 1 & \dots & ρ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ & ρ & \dots & 1 \end{matrix}]

$\textbf{C}= \sigma^2\begin{bmatrix} \textbf{R} & 0& \cdots & 0\\ 0& \textbf{R} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots &\textbf{R}\\ \end{bmatrix}, \textbf{R}= \begin{bmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & \cdots & \rho \\ \vdots & \vdots & \ddots & \vdots \\ \rho & \rho & \cdots &1\\ \end{bmatrix}$ In words, this means that under the marginal model we simply consider

ρ

$\rho$ to be the expected correlation between two

e^{*}

$e^*$ s from the same subject (we assume the correlation across subjects is 0). When

ρ

$\rho$ is positive, two observations drawn from the same subject tend to be more similar (closer together), on average, than two observations drawn randomly from the dataset while ignoring the clustering due to subjects. When

ρ

$\rho$ is negative, two observations drawn from the same subject tend to be less similar (further apart), on average, than two observations drawn completely at random. (More information about this interpretation in the question/answers here.)

So now when we look at the equation for the variance of the sample mean under the marginal model, we have

\begin{aligned} var (\frac{1}{n m} \sum_{i} \sum_{j} y_{i j}) & = var (\frac{1}{n m} \sum_{i} \sum_{j} β + e_{i j}^{*}) \\ = \frac{1}{n^{2} m^{2}} var (\sum_{i} \sum_{j} e_{i j}^{*}) \\ = \frac{1}{n^{2} m^{2}} (n (m σ^{2} + (m^{2} - m) ρ σ^{2})) \\ = \frac{σ^{2} (1 + (m - 1) ρ)}{n m} \\ = (\frac{ρ}{n} + \frac{1 - ρ}{n m}) σ^{2}, \end{aligned}

$\begin{aligned} \text{var}(\frac{1}{nm}\sum_i\sum_jy_{ij}) &= \text{var}(\frac{1}{nm}\sum_i\sum_j\beta + e^*_{ij}) \\ &= \frac{1}{n^2m^2}\text{var}(\sum_i\sum_je^*_{ij}) \\ &= \frac{1}{n^2m^2}\Big(n\big(m\sigma^2 + (m^2-m)\rho\sigma^2\big)\Big) \\ &= \frac{\sigma^2\big(1+(m-1)\rho\big)}{nm} \\ &= \Big(\frac{\rho}{n}+\frac{1-\rho}{nm}\Big)\sigma^2, \end{aligned}$ which is the same variance expression we derived above for the random-effects model, just with

σ_{e}^{2} + σ_{u}^{2} = σ^{2}

$\sigma^2_e+\sigma^2_u=\sigma^2$ , which is consistent with our note above that

e_{i j}^{*} = u_{i} + e_{i j}

$e^*_{ij} = u_i + e_{ij}$ . The advantage of this (statistically equivalent) perspective is that here we can think about a negative intra-class correlation without needing to invoke any weird concepts like a negative subject variance. Negative intra-class correlations just fit naturally in this framework.

(BTW, just a quick aside to point out that the second-to-last line of the derivation above implies that we must have $\rho \ge -1/(m-1)$ , or else the whole equation is negative, but variances can't be negative! So there is a lower bound on the intra-class correlation that depends on how many measurements we have per cluster. For $m=2$ (i.e., we measure each subject twice), the intra-class correlation can go all the way down to $\rho=-1$ ; for $m=3$ it can only go down to $\rho=-1/2$ ; and so on. Fun fact!)

So finally, once again considering the total number of observations $nm$ to be a constant, we see that the second-to-last line of the derivation above just looks like

(1 + (m - 1) ρ) \times positive constant .

$\big(1+(m-1)\rho\big) \times \text{positive constant}.$ So when

ρ > 0

$\rho>0$ , having

m

$m$ as small as possible (so that we take fewer measurements of more subjects--in the limit, 1 measurement of each subject) makes the variance of the estimate as small as possible. But when

ρ < 0

$\rho<0$ , we actually want

m

$m$ to be as large as possible (so that, in the limit, we take all

n m

$nm$ measurements from a single subject) in order to make the variance as small as possible. And when

ρ = 0

$\rho=0$ , the variance of the estimate is just a constant, so our allocation of

m

$m$ and

n

$n$ doesn't matter.

— Jake Westfall
quelle

+1. Great answer. I have to admit that the second part, about

ρ < 0

$\rho<0$ , is quite unintuitive: even with a huge (or infinite) total number

n m

$nm$ of observations the best we can do is to allocate all observations to one single subject, meaning that the standard error of the mean will be

σ_{u}

$\sigma_u$ and it's not possible in principle to reduce it any further. This is just so weird! True

β

$\beta$ remains unknowable, whatever resources one puts into measuring it. Is this interpretation correct?

— Amöbe sagt Reinstate Monica

Ah, no. The above is not correct because as

m

$m$ increases to infinity,

ρ

$\rho$ cannot stay negative and has to approach zero (corresponding to zero subject variance). Hmm. This negative correlation is a funny thing: it's not really a parameter of the generative model because it's constrained by the sample size (whereas one would normally expect a generative model to be able to generate any number of observations, whatever the parameters are). I am not quite sure what is the proper way to think about it.

— Amöbe sagt Reinstate Monica

@ DeltaIV Was ist in diesem Fall "die Kovarianzmatrix der zufälligen Effekte"? In dem oben von Jake geschriebenen gemischten Modell gibt es nur einen Zufallseffekt, und so gibt es eigentlich keine "Kovarianzmatrix", sondern nur eine Zahl:

σ_{u}^{2}

$\sigma^2_u$ . Was

Σ

$\Sigma$ Beziehen Sie sich auf?

— Amöbe sagt Reinstate Monica

@DeltaIV Nun, das allgemeine Prinzip ist en.wikipedia.org/wiki/Inverse-variance_weighting , und die Varianz des Stichprobenmittelwerts jedes Subjekts ist gegeben durch

σ_{u}^{2} + σ_{e}^{2} / m_{i}

$\sigma^2_u + \sigma^2_e/m_i$ (Deshalb schrieb Jake oben, dass die Gewichte von der Schätzung der Varianz zwischen den Subjekten abhängen müssen). Die Schätzung der subjektinternen Varianz ergibt sich aus der Varianz der gepoolten subjektinternen Abweichungen, die Schätzung der subjektinternen Varianz ist die Varianz der Mittelwerte der Subjekte und unter Verwendung all dessen, was man zur Berechnung der Gewichte verwenden kann. (Ich bin mir aber nicht sicher, ob dies 100% dem entspricht, was ich tun werde.)

— Amöbe sagt Reinstate Monica

Jake, yes, it's exactly this hard-coding of

m

$m$ that was bothering me. If this is "sample size" then it cannot be a parameter of the underlying system. My current thinking is that negative

ρ

$\rho$ should actually indicate that there is another within-subject factor that is ignored/unknown to us. E.g. it could be pre & post of some intervention and the difference between them is so large that the measurements are negatively correlated. But this would mean that

m

$m$ is not really a sample size, but the number of levels of this unknown factor, and that can certainly be hard coded...

— Amöbe sagt Reinstate Monica