Definition der Familie einer Distribution?

Hat eine Familie einer Distribution eine andere Definition für Statistik als in anderen Disziplinen?

Im Allgemeinen ist eine Kurvenfamilie ein Satz von Kurven, von denen jede durch eine Funktion oder Parametrisierung gegeben ist, in der einer oder mehrere der Parameter variiert werden. Solche Familien werden beispielsweise zur Charakterisierung von elektronischen Bauteilen verwendet .

Für die Statistik ergibt sich eine Familie nach einer Quelle aus der Variation des Formparameters. Wie können wir dann verstehen, dass die Gammaverteilung einen Form- und Skalierungsparameter hat und nur die generalisierte Gammaverteilung zusätzlich einen Standortparameter hat? Ist die Familie dann das Ergebnis einer Änderung des Standortparameters? Laut @whuber ist die Bedeutung einer Familie implizit. Eine "Parametrisierung" einer Familie ist eine kontinuierliche Abbildung aus einer Teilmenge von ℝ mit seiner üblichen Topologie in den Verteilungsraum, dessen Bild diese Familie ist. $^n$

Was ist in einfacher Sprache eine Familie für statistische Verteilungen?

Die Frage nach den Beziehungen zwischen den statistischen Eigenschaften von Verteilungen derselben Familie hat bereits zu erheblichen Kontroversen über eine andere Frage geführt, weshalb es sich zu lohnen scheint, die Bedeutung zu untersuchen.

Daß dies nicht unbedingt eine einfache Frage ist, ergibt sich aus der Verwendung des Ausdrucks Exponentialfamilie , der nichts mit einer Kurvenfamilie zu tun hat, sondern mit der Änderung der PDF-Form einer Verteilung durch Neuparametrisierung nicht nur von Parametern aber auch die Ersetzung von Funktionen unabhängiger Zufallsvariablen.

— Carl
quelle

Meinen Sie mit der Formulierung "Familie einer Distribution" etwas anderes "eine Familie von Distributionen"? Eine Exponentialfamilie ist eine Familie von Verteilungen (mit bestimmten Eigenschaften). Wird die PDF-Datei jeder Verteilung als Kurve interpretiert, entspricht sie sogar einer Kurvenfamilie, sodass die letzten Absätze verwirrt erscheinen.

— Juho Kokkala

@JuhoKokkala Es scheint verwirrend, weil die Bedeutung von "Familie" kontextabhängig ist. Beispielsweise liegt eine Normalverteilung von unbekanntem Mittelwert und bekannter Varianz in der Exponentialfamilie. Eine Normalverteilung hat eine unendliche Unterstützung

(−∞,+∞) $(-\infty,+\infty)$ , und eine Exponentialverteilung hat eine halb-unendliche Unterstützung (

[0,+∞) $[0,+\infty)$ , so dass es keine Kurvenfamilie für eine Exponentialverteilung gibt, die den Bereich einer Normalen abdeckt Verteilung, sie haben nie die gleiche Form ...

— Carl

@JuhoKokkala ... und ein exponentielles PDF hat nicht einmal einen Standortparameter, wohingegen eine Normalverteilung nicht auf einen verzichten kann. Unter dem obigen Link finden Sie die benötigten Substitutionen und den Kontext, in dem sich ein normales PDF in der Exponentialfamilie befindet.

— Carl

stats.stackexchange.com/questions/129990/… kann relevant sein. "Die Normalverteilung von unbekanntem Mittelwert und bekannter Varianz liegt in der Exponentialfamilie" ist meines Wissens ein Missbrauch der Terminologie (obwohl etwas verbreitet). Um genau zu sein, ist eine Exponentialfamilie eine Familie von Verteilungen mit bestimmten Eigenschaften. Die Familie der Normalverteilungen mit unbekanntem Mittelwert und bekannter Varianz ist eine Exponentialfamilie; Die Familie der Exponentialverteilungen ist eine weitere Exponentialfamilie usw.

— Juho Kokkala

@JuhoKokkala: Diese "Familie" wird in einem speziellen Fall so häufig (ab) verwendet, dass sie "Familiengruppe" bedeutet. Vielleicht lohnt es sich, eine andere Antwort zu finden. (Ich kann mir keine anderen Fälle

— vorstellen -

Antworten:

Das statistische und das mathematische Konzept sind genau gleich, wobei "Familie" ein allgemeiner mathematischer Begriff mit technischen Variationen ist, die an verschiedene Umstände angepasst sind:

Eine parametrische Familie ist eine Kurve (oder eine Oberfläche oder eine andere endliche Verallgemeinerung davon) im Raum aller Verteilungen.

Der Rest dieses Beitrags erklärt, was das bedeutet. Abgesehen davon halte ich nichts davon für kontrovers, weder mathematisch noch statistisch (abgesehen von einem kleinen Problem, das unten erwähnt wird). Zur Unterstützung dieser Meinung habe ich viele Verweise geliefert (hauptsächlich auf Wikipedia-Artikel).

Diese Terminologie von „Familien“ neigt dazu verwendet werden , wenn Klassen Studium $\mathcal C_Y$ von Funktionen in einen Satz $Y$ oder „Karten“ . Bei gegebener Domäne $X$ ist eine Familie $\mathcal F$ von Karten auf $X$ , die durch eine Menge $\Theta$ (die "Parameter") parametrisiert sind, eine Funktion

F : X \times Θ \to Y

$\mathcal F : X\times \Theta\to Y$

für die (1) für jedes $\theta\in\Theta$ ist die Funktion $\mathcal{F}_\theta:X\to Y$ gegeben durch $\mathcal{F}_\theta(x)=\mathcal{F}(x,\theta)$ ist in $\mathcal{C}_Y$ und (2) $\mathcal F$ selbst bestimmte "nett" Eigenschaften hat.

Die Idee ist, dass wir Funktionen von $X$ nach $Y$ auf "glatte" oder kontrollierte Weise variieren wollen . Eigenschaft (1) bedeutet, dass jedes $\theta$ thgr; eine solche Funktion bezeichnet, während die Details von Eigenschaft (2) den Sinn erfassen, in dem eine "kleine" Änderung von $\theta$ thgr; eine ausreichend "kleine" Änderung von induziert $\mathcal{F}_\theta$ .

Ein mathematisches Standardbeispiel, das dem in der Frage genannten nahe kommt, ist eine Homotopie . In diesem Fall ist die Kategorie kontinuierlicher Karten von topologischen Räumen in den topologischen Raum ; ist das Einheitsintervall mit seiner üblichen Topologie, und wir fordern, dass eine kontinuierliche Abbildung vom topologischen Produkt in . Es kann als "kontinuierliche Verformung der Karte $\mathcal{C}_Y$ $X$ $Y$ $\Theta=[0,1]\subset\mathbb{R}$ $\mathcal{F}$ $X \times \Theta$ $Y$ bis "WennKurvenin und die Homotopie ist eine sanfte Verformung von einer Kurve zur anderen. $\mathcal{F}_0$ $\mathcal{F}_1$ ist selbst ein Intervall, solche Karten sind $X=[0,1]$ $Y$

Für statistische Anwendungen ist die Menge aller Verteilungen auf (oder in der Praxis auf für einige , aber um die Darstellung einfach zu halten, werde ich mich auf ). Wir können es mit der Menge aller nicht abnehmenden càdlàg- Funktionen identifizieren wobei der Abschluss ihres Bereichs sowohl als auch $\mathcal{C}_Y$ $\mathbb{R}$ $\mathbb{R}^n$ $n$ $n=1$ $\mathbb{R}\to [0,1]$ $0$ $1$ : Dies sind die kumulativen Verteilungsfunktionen oder einfach Verteilungsfunktionen. Somit und $X=\mathbb R$ . $Y=[0,1]$

Eine Familie von Verteilungen ist eine beliebige Teilmenge von . $\mathcal{C}_Y$ Ein anderer Name für eine Familie ist das statistische Modell. Es besteht aus allen Verteilungen, von denen wir annehmen, dass sie unsere Beobachtungen steuern, aber wir wissen nicht, welche Verteilung die tatsächliche ist.

Eine Familie kann leer sein.
selbst ist eine Familie. $\mathcal{C}_Y$
Eine Familie kann aus einer einzelnen Verteilung oder nur einer begrenzten Anzahl von ihnen bestehen.

Diese abstrakten satztheoretischen Eigenschaften sind von relativ geringem Interesse oder Nutzen. Nur wenn wir zusätzliche (relevante) mathematische Strukturen auf , wird dieses Konzept nützlich. Aber welche Eigenschaften von $\mathcal{C}_Y$ $\mathcal{C}_Y$ sind von statistischem Interesse? Einige, die häufig auftauchen, sind:

ist einekonvexe Menge: Wenn zwei beliebige Verteilungen , können wir dieMischungsverteilungfür alle. Dies ist eine Art "Homotopie" vonnach $\mathcal{C}_Y$ ${F}, {G}\in \mathcal{C}_Y$ $(1-t){F}+t{G}\in Y$ $t\in[0,1]$ $F$ $G$ .
Große Teile von unterstützen verschiedene Pseudo-Metriken, wie zum Beispiel die Kullback-Leibler-Divergenz $\mathcal{C}_Y$ oder die eng verwandte Fisher-Information-Metrik.
hat eine additive Struktur: entspricht zwei beliebigen Verteilungenundihre Summe, . $\mathcal{C}_Y$ $F$ $G$ ${F}\star {G}$
unterstützt viele nützliche natürliche Funktionen, die oft als "Eigenschaften" bezeichnet werden. Dazu gehören alle festen Quantile (wie der Median) sowie dieKumulanten. $\mathcal{C}_Y$
$\mathcal{C}_Y$ is a subset of a function space. As such, it inherits many useful metrics, such as the sup norm ( $L^\infty$ norm) given by
$| | F - G | | \infty = sup x \in R | F (x) - G (x) | .$ $||F-G||_\infty = \sup_{x\in\mathbb{R}}|F(x)-G(x)|.$
Natural group actions on $\mathbb R$ induce actions on $\mathcal{C}_Y$ . The commonest actions are translations $T_\mu:x \to x+\mu$ and scalings $S_\sigma:x\to x\sigma$ for $\sigma\gt 0$ . The effect these have on a distribution is to send $F$ to the distribution given by $F^{\mu,\sigma}(x) = F((x-\mu)/\sigma)$ . These lead to the concepts of location-scale families and their generalizations. (I don't supply a reference, because extensive Web searches turn up a variety of different definitions: here, at least, may be a tiny bit of controversy.)

The properties that matter depend on the statistical problem and on how you intend to analyze the data. Addressing all the variations suggested by the preceding characteristics would take too much space for this medium. Let's focus on one common important application.

Take, for instance, Maximum Likelihood. In most applications you will want to be able to use Calculus to obtain an estimate. For this to work, you must be able to "take derivatives" in the family.

(Technical aside: The usual way in which this is accomplished is to select a domain $\Theta\subset \mathbb{R}^d$ for $d\ge 0$ and specify a continuous, locally invertible function $p$ from $\Theta$ into $\mathcal{C}_Y$ . (This means that for every $\theta\in\Theta$ there exists a ball $B(\theta, \epsilon)$ , with $\epsilon\gt 0$ for which $p\mid_{B(\theta,\epsilon)}: B(\theta,\epsilon)\cap \Theta \to \mathcal{C}_Y$ is one-to-one. In other words, if we alter $\theta$ by a sufficiently small amount we will always get a different distribution.))

Consequently, in most ML applications we require that $p$ be continuous (and hopefully, almost everywhere differentiable) in the $\Theta$ component. (Without continuity, maximizing the likelihood generally becomes an intractable problem.) This leads to the following likelihood-oriented definition of a parametric family:

A parametric family of (univariate) distributions is a locally invertible map
$F : R \times Θ \to [0, 1],$ $\mathcal{F}:\mathbb{R}\times\Theta \to [0,1],$ with $\Theta\subset \mathbb{R}^n$ , for which (a) each $\mathcal{F}_\theta$ is a distribution function and (b) for each $x\in\mathbb R$ , the function $\mathcal{L}_x: \theta\to [0,1]$ given by $\mathcal{L}_x(\theta) = \mathcal{F}(x,\theta)$ is continuous and almost everywhere differentiable.

Note that a parametric family $\mathcal F$ is more than just the collection of $\mathcal{F}_\theta$ : it also includes the specific way in which parameter values $\theta$ correspond to distributions.

Let's end up with some illustrative examples.

Let $\mathcal{C}_Y$ be the set of all Normal distributions. As given, this is not a parametric family: it's just a family. To be parametric, we have to choose a parameterization. One way is to choose $\Theta = \{(\mu,\sigma)\in\mathbb{R}^2\mid \sigma \gt 0\}$ and to map $(\mu,\sigma)$ to the Normal distribution with mean $\mu$ and variance $\sigma^2$ .
The set of Poisson $(\lambda)$ distributions is a parametric family with $\lambda\in\Theta=(0,\infty)\subset\mathbb{R}^1$ .
The set of Uniform $(\theta, \theta+1)$ distributions (which features prominently in many textbook exercises) is a parametric family with $\theta\in\mathbb{R}^1$ . In this case, $F_\theta(x) = \max(0, \min(1, x-\theta))$ is differentiable in $\theta$ except for $\theta\in\{x, x-1\}$ .
Let $F$ and $G$ be any two distributions. Then $\mathcal{F}(x,\theta)=(1-\theta)F(x)+\theta G(x)$ is a parametric family for $\theta\in[0,1]$ . (Proof: the image of $\mathcal F$ is a set of distributions and its partial derivative in $\theta$ equals $-F(x)+G(x)$ which is defined everywhere.)
The Pearson family is a four-dimensional family, $\Theta\subset\mathbb{R}^4$ , which includes (among others) the Normal distributions, Beta distributions, and Inverse Gamma distributions. This illustrates the fact that any one given distribution may belong to many different distribution families. This is perfectly analogous to observing that any point in a (sufficiently large) space may belong to many paths that intersect there. This, together with the previous construction, shows us that no distribution uniquely determines a family to which it belongs.
The family $\mathcal{C}_Y$ of all finite-variance absolutely continuous distributions is not parametric. The proof requires a deep theorem of topology: if we endow $\mathcal{C}_Y$ with any topology (whether statistically useful or not) and $p: \Theta\to\mathcal{C}_Y$ is continuous and locally has a continuous inverse, then locally $\mathcal{C}_Y$ must have the same dimension as that of $\Theta$ . However, in all statistically meaningful topologies, $\mathcal{C}_Y$ is infinite dimensional.

— whuber
quelle

It will take me about a day to digest your answer. I will have to chew slowly. Meanwhile, thank you.

— Carl

(+1) OK, I slogged through it. So is

$\mathcal{F}:\mathbb{R}\times\Theta \to [0,1]$ a Polish space or not? Can we do a simple answer so people know how to avoid using the word family improperly, please. @JuhoKokkala related, for example, that Wikipedia abused language in their exponential family, that needs clarification.

— Carl

Doesn't the second sentence of this answer serve that request for simplicity?

— whuber

IMHO, however uninformed, no, it does not due to incompleteness, it doesn't say what a family isn't. The concept "in the space of all distributions" seems to relate to statistics only.

— Carl

I have accepted your answer. You have enough information in it that I could apply it to the question in question.

— Carl

To address a specific point brought up in the question: "exponential family" does not denote a set of distributions. (The standard, say, exponential distribution is a member of the family of exponential distributions, an exponential family; of the family of gamma distributions, also an exponential family; of the family of Weibull distributions, not an exponential family; & of any number of other families you might dream up.) Rather, "exponential" here refers to a property possessed by a family of distributions. So we shouldn't talk of "distributions in the exponential family" but of "exponential families of distributions"—the former is an abuse of terminology, as @JuhoKokkala points out. For some reason no-one commits this abuse when talking of location–scale families.

— Scortchi - Reinstate Monica
quelle

Thanks to @whuber there is enough information to summarize in what I hope is a simpler form relating to the question from which this post arose. "Another name for a family [Sic, statistical family] is [a] statistical model."

From that Wikipedia entry: A statistical model consists of all distributions that we suppose govern our observations, but we do not otherwise know which distribution is the actual one. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e., some of the variables are stochastic. A statistical model is usually thought of as a pair $( S , P )$ , where $S$ is the set of possible observations, i.e., the sample space, and $P$ is a set of probability distributions on $S$ .

Suppose that we have a statistical model $(S, \mathcal{P})$ with $\mathcal{P}=\{P_{\theta} : \theta \in \Theta\}$ . The model is said to be a Parametric model if $\Theta$ has a finite dimension. In notation, we write that $\Theta \subseteq \mathbb{R}^d$ where $d$ is a positive integer ( $\mathbb{R}$ denotes the real numbers; other sets can be used, in principle). Here, $d$ is called the dimension of the model.

As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that

$\mathcal{P}=\left\{P_{\mu,\sigma }(x) \equiv \frac{1}{\sqrt{2 \pi} \sigma} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) : \mu \in \mathbb{R}, \sigma > 0 \right\}.$ In this example, the dimension,

$d$ , equals 2, end quote.

Thus, if we reduce the dimensionality by assigning, for the example above, $\mu=0$ , we can show a family of curves by plotting $\sigma=1,2,3,4,5$ or whatever choices for $\sigma$ .

— Carl
quelle