Analogie der Pearson-Korrelation für 3 Variablen

17

Ich interessiere mich dafür, ob eine "Korrelation" von drei Variablen etwas ist oder nicht, und wenn ja, was wäre das?

Pearson-Produktmoment-Korrelationskoeffizient

\frac{E {(X - μ_{X}) (Y - μ_{Y})}}{\sqrt{V a r (X) V a r (Y)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)\}}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$

Nun die Frage für 3 Variablen: Ist

\frac{E {(X - μ_{X}) (Y - μ_{Y}) (Z - μ_{Z})}}{\sqrt{V a r (X) V a r (Y) V a r (Z)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)(Z-\mu_Z)\}} {\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)\mathrm{Var}(Z)}}$

etwas?

In R scheint es etwas Interpretierbares zu sein:

> a <- rnorm(100); b <- rnorm(100); c <- rnorm(100)
> mean((a-mean(a)) * (b-mean(b)) * (c-mean(c))) / (sd(a) * sd(b) * sd(c))
[1] -0.3476942

Normalerweise betrachten wir die Korrelation zwischen 2 Variablen, wenn der Wert einer festen dritten Variablen gegeben ist. Könnte das jemand klären?

correlation pearson-r

— PascalVKooten
quelle

2

1) In Ihrer bivariaten Pearson-Formel bedeutet "E" (Mittelwert in Ihrem Code) Division durch n, dann st. Abweichungen müssen auch auf n basieren (nicht auf n-1). 2) Lassen Sie alle drei Variablen die gleiche Variable sein. In diesem Fall erwarten wir eine Korrelation von 1 (wie im bivariaten Fall), aber leider ...

— ttnphns 13.08.13

Für eine trivariate Normalverteilung ist es Null, unabhängig von den Korrelationen.

— Ray Koopman

1

Ich denke wirklich, dass der Titel davon profitieren würde, in "Analogy of Pearson correlation for 3 variables" oder ähnlich geändert zu werden - es würde die Links hier etwas informativer machen

— Silverfish

1

@ Silverfish Ich bin damit einverstanden! Ich habe den Titel aktualisiert, danke.

— PascalVKooten

11

Es ist in der Tat etwas. Um dies herauszufinden, müssen wir untersuchen, was wir über die Korrelation selbst wissen.

Die Korrelationsmatrix eines Vektors bewertet Zufallsvariable $\mathbf{X}=(X_1,X_2,\ldots,X_p)$ ist die Varianz-Kovarianz - Matrix, oder einfach als "Varianz" , der standardisierten Version von $\mathbf{X}$ . Das heißt, jedes $X_i$ wird durch seine neu zentrierte, neu skalierte Version ersetzt.
Die Kovarianz von und ist die Erwartung an das Produkt ihrer zentrierten Versionen. Das heißt , wir schreiben und $X_i$ $X_j$ $X^\prime_i = X_i - E[X_i]$ $X^\prime_j = X_j - E[X_j]$

$Cov (X_{i}, X_{j}) = E [X_{i}^{'} X_{j}^{'}] .$ $\operatorname{Cov}(X_i,X_j) = E[X^\prime_i X^\prime_j].$
Die Varianz von , die ich schreiben werde , ist keine einzelne Zahl. Es ist das Array von Werten $\mathbf{X}$ $\operatorname{Var}(\mathbf{X})$
$Var (X)_{i j} = Cov (X_{i}, X_{j}) .$ $\operatorname{Var}(\mathbf{X})_{ij}=\operatorname{Cov}(X_i,X_j).$
Die Kovarianz für die beabsichtigte Verallgemeinerung wird als Tensor betrachtet . Das heißt , es ist eine ganze Sammlung von Mengen , indiziert durch und im Bereich von bis , deren Werte in einer besonders einfachen vorhersehbarer Weise verändern , wenn eine lineare Transformation unterzogen wird . Insbesondere sei eine andere vektorielle Zufallsvariable, die durch definiert ist $v_{ij}$ $i$ $j$ $1$ $p$ $\mathbf{X}$ $\mathbf{Y}=(Y_1,Y_2,\ldots,Y_q)$

$Y_{i} = \sum_{j = 1}^{p} a_{i}^{j} X_{j} .$ $Y_i = \sum_{j=1}^p a_i^{\,j}X_j.$
Die Konstanten (undsindIndizes-ist keine Potenz) bilden einArray $a_i^{\,j}$ $i$ $j$ $j$ $q\times p$ ,und. Die Linearität der Erwartung impliziert $\mathbb{A} = (a_i^{\,j})$ $j=1,\ldots, p$ $i=1,\ldots, q$

$Var (Y)_{i j} = \sum a_{i}^{k} a_{j}^{l} Var (X)_{k l} .$ $\operatorname{Var}(\mathbf Y)_{ij} = \sum a_i^{\,k}a_j^{\,l}\operatorname{Var}(\mathbf X)_{kl} .$
In der Matrixnotation

$Var (Y) = A Var (X) A^{'} .$ $\operatorname{Var}(\mathbf Y) = \mathbb{A}\operatorname{Var}(\mathbf X) \mathbb{A}^\prime .$
Alle Komponenten von sind aufgrund der Polarisationsidentität tatsächlich univariate Varianzen $\operatorname{Var}(\mathbf{X})$

$4 Cov (X_{i}, X_{j}) = Var (X_{i} + X_{j}) - Var (X_{i} - X_{j}) .$ $4\operatorname{Cov}(X_i,X_j) = \operatorname{Var}(X_i+X_j) - \operatorname{Var}(X_i-X_j).$
Dies sagt uns, dass Sie, wenn Sie Varianzen univariater Zufallsvariablen verstehen, bereits Kovarianzen bivariater Variablen verstehen: Sie sind "nur" lineare Kombinationen von Varianzen.

Der Ausdruck in der Frage ist vollkommen analog: Die Variablen wurden wie in standardisiert . Wir können verstehen, was es darstellt, wenn wir überlegen, was es für eine Variable bedeutet, ob standardisiert oder nicht. Wir würden jedes durch seine zentrierte Version wie in ersetzen und Mengen mit drei Indizes bilden, $X_i$ $(1)$ $X_i$ $(2)$

μ_{3} (X)_{i j k} = E [X_{i}^{'} X_{j}^{'} X_{k}^{'}] .

$\mu_3(\mathbf{X})_{ijk} = E[X_i^\prime X_j^\prime X_k^\prime].$

These are the central (multivariate) moments of degree $3$ . As in $(4)$ , they form a tensor: when $\mathbf{Y} = \mathbb{A}\mathbf{X}$ , then

μ_{3} (Y)_{i j k} = \sum_{l, m, n} a_{i}^{l} a_{j}^{m} a_{k}^{n} μ_{3} (X)_{l m n} .

$\mu_3(\mathbf{Y})_{ijk} = \sum_{l,m,n} a_i^{\,l}a_j^{\,m}a_k^{\,n} \mu_3(\mathbf{X})_{lmn}.$

The indexes in this triple sum range over all combinations of integers from $1$ through $p$ .

The analog of the Polarization Identity is

\begin{aligned} 24 μ_{3} (X)_{i j k} = \\ μ_{3} (X_{i} + X_{j} + X_{k}) - μ_{3} (X_{i} - X_{j} + X_{k}) - μ_{3} (X_{i} + X_{j} - X_{k}) + μ_{3} (X_{i} - X_{j} - X_{k}) . \end{aligned}

$\eqalign{&24\mu_3(\mathbf{X})_{ijk} = \\ &\mu_3(X_i+X_j+X_k) - \mu_3(X_i-X_j+X_k) - \mu_3(X_i+X_j-X_k) + \mu_3(X_i-X_j-X_k).}$

On the right hand side, $\mu_3$ refers to the (univariate) central third moment: the expected value of the cube of the centered variable. When the variables are standardized, this moment is usually called the skewness. Accordingly, we may think of $\mu_3(\mathbf{X})$ as being the multivariate skewness of $\mathbf{X}$ . It is a tensor of rank three (that is, with three indices) whose values are linear combinations of the skewnesses of various sums and differences of the $X_i$ . If we were to seek interpretations, then, we would think of these components as measuring in $p$ dimensions whatever the skewness is measuring in one dimension. In many cases,

The first moments measure the location of a distribution;
The second moments (the variance-covariance matrix) measure its spread;
The standardized second moments (the correlations) indicate how the spread varies in $p$ -dimensional space; and
The standardized third and fourth moments are taken to measure the shape of a distribution relative to its spread.

To elaborate on what a multidimensional "shape" might mean, observed that we can understand PCA as a mechanism to reduce any multivariate distribution to a standard version located at the origin and equal spreads in all directions. After PCA is performed, then, $\mu_3$ would provide the simplest indicators of the multidimensional shape of the distribution. These ideas apply equally well to data as to random variables, because data can always be analyzed in terms of their empirical distribution.

Reference

Alan Stuart & J. Keith Ord, Kendall's Advanced Theory of Statistics Fifth Edition, Volume 1: Distribution Theory; Chapter 3, Moments and Cumulants. Oxford University Press (1987).

Appendix: Proof of the Polarization Identity

Let $x_1,\ldots, x_n$ be algebraic variables. There are $2^n$ ways to add and subtract all $n$ of them. When we raise each of these sums-and-differences to the $n^\text{th}$ power, pick a suitable sign for each of those results, and add them up, we will get a multiple of $x_1x_2\cdots x_n$ .

More formally, let $S=\{1,-1\}^n$ be the set of all $n$ -tuples of $\pm 1$ , so that any element $s\in S$ is a vector $s=(s_1,s_2,\ldots,s_n)$ whose coefficients are all $\pm 1$ . The claim is

\begin{matrix} (1) & 2^{n} n! x_{1} x_{2} \dots x_{n} = \sum_{s \in S} s_{1} s_{2} \dots s_{n} (s_{1} x_{1} + s_{2} x_{2} + \dots + s_{n} x_{n})^{n} . \end{matrix}

$2^n n!\, x_1x_2\cdots x_n = \sum_{s\in S} \color{red}{s_1s_2\cdots s_n}(s_1x_1+s_2x_2+\cdots+s_nx_n)^n.\tag{1}$

Indeed, the Multinomial Theorem states that the coefficient of the monomial $x_1^{i_1}x_2^{i_2}\cdots x_n^{i_n}$ (where the $i_j$ are nonnegative integers summing to $n$ ) in the expansion of any term on the right hand side is

(\binom{n}{i_{1}, i_{2}, \dots, i_{n}}) s_{1}^{i_{1}} s_{2}^{i_{2}} \dots s_{n}^{i_{n}} .

$\binom{n}{i_1,i_2,\ldots,i_n}s_1^{i_1}s_2^{i_2}\cdots s_n^{i_n}.$

In the sum $(1)$ , the coefficients involving $x_1^{i_1}$ appear in pairs where one of each pair involves the case $s_1=1$ , with coefficient proportional to $\color{red}{s_1}$ times $s_1^{i_1}$ , equal to $1$ , and the other of each pair involves the case $s_1=-1$ , with coefficient proportional to $\color{red}{-1}$ times $(-1)^{i_1}$ , equal to $(-1)^{i_1+1}$ . They cancel in the sum whenever $i_1+1$ is odd. The same argument applies to $i_2, \ldots, i_n$ . Consequently, the only monomials that occur with nonzero coefficients must have odd powers of all the $x_i$ . The only such monomial is $x_1x_2\cdots x_n$ . It appears with coefficient $\binom{n}{1,1,\ldots,1}=n!$ in all $2^n$ terms of the sum. Consequently its coefficient is $2^nn!$ , QED.

We need take only half of each pair associated with $x_1$ : that is, we can restrict the right hand side of $(1)$ to the terms with $s_1=1$ and halve the coefficient on the left hand side to $2^{n-1}n!$ . That gives precisely the two versions of the Polarization Identity quoted in this answer for the cases $n=2$ and $n=3$ : $2^{2-1}2! = 4$ and $2^{3-1}3!=24$ .

Of course the Polarization Identity for algebraic variables immediately implies it for random variables: let each $x_i$ be a random variable $X_i$ . Take expectations of both sides. The result follows by linearity of expectation.

— whuber
quelle

Well done on explaining so far! Multivariate skewness kind of makes sense. Could you perhaps add an example that would show the importance of this multivariate skewness? Either as an issue in a statistical models, or perhaps more interesting, what real life case would be subject to multivariate skewness :)?

— PascalVKooten

3

Hmmm. If we run...

a <- rnorm(100);
b <- rnorm(100);
c <- rnorm(100)
mean((a-mean(a))*(b-mean(b))*(c-mean(c)))/
  (sd(a) * sd(b) * sd(c))

it does seem to center on 0 (I haven't done a real simulation), but as @ttnphns alludes, running this (all variables the same)

a <- rnorm(100)
mean((a-mean(a))*(a-mean(a))*(a-mean(a)))/
  (sd(a) * sd(a) * sd(a))

also seems to center on 0, which certainly makes me wonder what use this could be.

— Peter Flom - Reinstate Monica
quelle

2

The nonsense apparently comes from the fact that sd or variance is a function of squaring, as is covariance. But with 3 variables, cubing occurs in the numerator while denominator remains based on originally squared terms

— ttnphns

2

Is that the root of it (pun intended)? Numerator and denominator have the same dimensions and units, which cancel, so that alone doesn't make the measure poorly formed.

— Nick Cox

3

@Nick That's right. This is simply one of the multivariate central third moments. It is one component of a rank-three tensor giving the full set of third moments (which is closely related to the order-3 component of the multivariate cumulant generating function). In conjunction with the other components it could be of some use in describing asymmetries (higher-dimensional "skewness") in the distribution. It's not what anyone would call a "correlation," though: almost by definition, a correlation is a second-order property of the standardized variable.

— whuber

1

Wenn Sie die "Korrelation" zwischen drei oder mehr Variablen berechnen müssen, können Sie Pearson nicht verwenden, da dies in diesem Fall für die unterschiedliche Reihenfolge der Variablen unterschiedlich ist . Wenn Sie an linearen Abhängigkeiten interessiert sind oder wie gut diese durch die 3D-Linie angepasst sind, können Sie PCA verwenden, die erklärte Varianz für den ersten PC ermitteln, Ihre Daten permutieren und die Wahrscheinlichkeit ermitteln, dass dieser Wert zufälligen Gründen entspricht. Ich habe hier etwas Ähnliches besprochen (siehe Technische Details unten).

Matlab-Code

% Simulate our experimental data
x=normrnd(0,1,100,1);
y=2*x.*normrnd(1,0.1,100,1);
z=(-3*x+1.5*y).*normrnd(1,2,100,1);
% perform pca
[loadings, scores,variance]=pca([x,y,z]);
% Observed Explained Variance for first principal component
OEV1=variance(1)/sum(variance)
% perform permutations
permOEV1=[];
for iPermutation=1:1000
    permX=datasample(x,numel(x),'replace',false);
    permY=datasample(y,numel(y),'replace',false);
    permZ=datasample(z,numel(z),'replace',false);
    [loadings, scores,variance]=pca([permX,permY,permZ]);
    permOEV1(end+1)=variance(1)/sum(variance);
end

% Calculate p-value
p_value=sum(permOEV1>=OEV1)/(numel(permOEV1)+1)

— Zlon
quelle