Informationsgewinn, gegenseitige Information und damit verbundene Maßnahmen

33

Andrew More definiert Informationsgewinn als:

$IG(Y|X) = H(Y) - H(Y|X)$

wobei $H(Y|X)$ die bedingte Entropie ist . Wikipedia nennt die oben genannte Menge jedoch gegenseitige Informationen .

Wikipedia hingegen definiert Informationsgewinn als die Kullback-Leibler-Divergenz (auch bekannt als Informationsdivergenz oder relative Entropie) zwischen zwei Zufallsvariablen:

$D_{KL}(P||Q) = H(P,Q) - H(P)$

wobei als die Kreuzentropie definiert ist . $H(P,Q)$

Diese beiden Definitionen scheinen nicht miteinander übereinzustimmen.

Ich habe auch andere Autoren gesehen, die über zwei weitere verwandte Konzepte gesprochen haben, nämlich differentielle Entropie und relativen Informationsgewinn.

Was ist die genaue Definition oder Beziehung zwischen diesen Größen? Gibt es ein gutes Lehrbuch, das sie alle behandelt?

Informationsgewinn
Gegenseitige Information
Kreuzentropie
Bedingte Entropie
Differenzielle Entropie
Relativer Informationsgewinn

information-theory

— Amelio Vazquez-Reina
quelle

2

Beachten Sie, dass die für die Kreuzentropie verwendete Notation auch für die gemeinsame Entropie verwendet wird. Ich habe

für die Kreuzentropie verwendet, um mich nicht zu verwirren, aber das ist zu meinem Vorteil, und ich habe diese Notation an keiner anderen Stelle gesehen.

H^{x} (P, Q)

$H^x(P, Q)$

— Michael McGowan

24

Ich denke, dass es keine Norm ist, die Kullback-Leibler-Divergenz "Informationsgewinn" zu nennen.

Die erste Definition ist Standard.

EDIT: kann jedoch auch als gegenseitige Information bezeichnet werden. $H(Y)−H(Y|X)$

Beachten Sie, dass Sie meines Erachtens keine wissenschaftliche Disziplin finden, die wirklich ein standardisiertes, präzises und konsistentes Benennungsschema hat. Sie müssen sich also immer die Formeln ansehen, da sie Ihnen in der Regel eine bessere Vorstellung geben.

Lehrbücher: siehe "Gute Einführung in verschiedene Arten von Entropie" .

Auch: Cosma Shalizi: Methoden und Techniken der Komplexen Systemwissenschaft: Ein Überblick, Kapitel 1 (S. 33–114) in Thomas S. Deisboeck und J. Yasha Kresh (Hrsg.), Komplexe Systemwissenschaft in der Biomedizin http: // arxiv.org/abs/nlin.AO/0307015

Robert M. Gray: Entropie- und Informationstheorie http://ee.stanford.edu/~gray/it.html

David MacKay: Informationstheorie, Inferenz und Lernalgorithmen http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

auch "Was ist" Entropie und Informationsgewinn "?"

— wolf.rauch
quelle

Danke @wolf. Ich bin geneigt, diese Antwort zu akzeptieren. Wenn die erste Definition Standard ist, wie würden Sie gegenseitige Informationen definieren?

— Amelio Vazquez-Reina

2

Es tut uns leid. Die erste Größe

wird auch oft als gegenseitige Information bezeichnet. Das ist ein Fall von inkonsistenter Benennung. Wie gesagt, ich glaube nicht, dass es eine einheitliche, eindeutige Eins-zu-Eins-Entsprechung von Begriffen und Namen gibt. ZB ist "gegenseitige Information" oder "Informationsgewinn" ein Sonderfall von KL-Abweichungen, so dass dieser Wikipedia-Artikel nicht so weit entfernt ist.

I G (Y | X) = H (Y) - H (Y | X)

$IG(Y|X)=H(Y)−H(Y|X)$

— wolf.rauch

4

$p(X,Y)$ $P(X)P(Y)$

\begin{aligned} I (X; Y) & = H (Y) - H (Y ∣ X) \\ = - \sum_{y} p (y) \log p (y) + \sum_{x, y} p (x) p (y ∣ x) \log p (y ∣ x) \\ = \sum_{x, y} p (x, y) \log p (y ∣ x) - \sum_{y} (\sum_{x} p (x, y)) \log p (y) \\ = \sum_{x, y} p (x, y) \log p (y ∣ x) - \sum_{x, y} p (x, y) \log p (y) \\ = \sum_{x, y} p (x, y) \log \frac{p (y ∣ x)}{p (y)} \\ = \sum_{x, y} p (x, y) \log \frac{p (y ∣ x) p (x)}{p (y) p (x)} \\ = \sum_{x, y} p (x, y) \log \frac{p (x, y)}{p (y) p (x)} \\ = D_{K L} (P (X, Y) ∣∣ P (X) P (Y)) \end{aligned}

$\begin{align} I(X; Y) &= H(Y) - H(Y \mid X)\\ &= - \sum_y p(y) \log p(y) + \sum_{x,y} p(x) p(y\mid x) \log p(y\mid x)\\ &= \sum_{x,y} p(x, y) \log{p(y\mid x)} - \sum_{y} \left(\sum_{x}p(x,y)\right) \log p(y)\\ &= \sum_{x,y} p(x, y) \log{p(y\mid x)} - \sum_{x,y}p(x, y) \log p(y)\\ &= \sum_{x,y} p(x, y) \log \frac{p(y\mid x)}{p(y)}\\ &= \sum_{x,y} p(x, y) \log \frac{p(y\mid x)p(x)}{p(y)p(x)}\\ &= \sum_{x,y} p(x, y) \log \frac{p(x, y)}{p(y)p(x)}\\ &= \mathcal D_{KL} (P(X,Y)\mid\mid P(X)P(Y)) \end{align}$

$p(y) = \sum_x p(x,y)$

— chris elgoog
quelle

1

Mutual information can be defined using Kullback-Liebler as

\begin{aligned} I (X; Y) = D_{K L} (p (x, y) | | p (x) p (y)) . \end{aligned}

$\begin{align*} I(X;Y) = D_{KL}(p(x,y)||p(x)p(y)). \end{align*}$

— yters
quelle

1

Extracting mutual information from textual datasets as a feature to train machine learning model: ( the task was to predict age, gender and personality of bloggers)

— Krebto
quelle

1

Both definitions are correct, and consistent. I'm not sure what you find unclear as you point out multiple points that might need clarification.

Firstly: $MI_{Mutual Information}\equiv$ $IG_{InformationGain}\equiv I_{Information}$ are all different names for the same thing. In different contexts one of these names may be preferable, i will call it hereon Information.

The second point is the relation between the Kullback–Leibler divergence- $D_{KL}$ , and Information. The Kullback–Leibler divergence is simply a measure of dissimilarity between two distributions. The Information can be defined in these terms of distributions' dissimilarity (see Yters' response). So information is a special case of $K_{LD}$ , where $K_{LD}$ is applied to measure the difference between the actual joint distribution of two variables (which captures their dependence) and the hypothetical joint distribution of the same variables, were they to be independent. We call that quantity Information.

The third point to clarify is the inconsistent, though standard notation being used, namely that $\operatorname{H} (X,Y)$ is both the notation for Joint entropy and for Cross-entropy as well.

So, for example, in the definition of Information:

\begin{aligned} I (X; Y) & \equiv H (X) - H (X | Y) \\ \equiv H (Y) - H (Y | X) \\ \equiv H (X) + H (Y) - H (X, Y) \\ \equiv H (X, Y) - H (X | Y) - H (Y | X) \end{aligned}

$\begin{aligned}\operatorname {I} (X;Y)&{}\equiv \mathrm {H} (X)-\mathrm {H} (X|Y)\\&{}\equiv \mathrm {H} (Y)-\mathrm {H} (Y|X)\\&{}\equiv \mathrm {H} (X)+\mathrm {H} (Y)-\mathrm {H} (X,Y)\\&{}\equiv \mathrm {H} (X,Y)-\mathrm {H} (X|Y)-\mathrm {H} (Y|X)\end{aligned}$ in both last lines,

H (X, Y)

$\operatorname{H}(X,Y)$ is the joint entropy. This may seem inconsistent with the definition in the Information gain page however:

D K L (P | | Q) = H (P, Q) - H (P)

$DKL(P||Q)=H(P,Q)−H(P)$ but you did not fail to quote the important clarification -

H (P, Q)

$\operatorname{H}(P,Q)$ is being used there as the cross-entropy (as is the case too in the cross entropy page).

Joint-entropy and Cross-entropy are NOT the same.

Check out this and this where this ambiguous notation is addressed and a unique notation for cross-entropy is offered - $H_q(p)$

I would hope to see this notation accepted and the wiki-pages updated.

— אלימלך שרייבר
quelle

wonder why the equations are not displayed properly..

— Shaohua Li