Ist es möglich, dass zwei zufällige Variablen aus derselben Verteilungsfamilie dieselbe Erwartung und Varianz haben, aber unterschiedliche höhere Momente?

12

Ich dachte über die Bedeutung der Familie auf der Ortsskala nach. Mein Verständnis ist, dass für jedes $X$ Mitglied einer Ortsskalenfamilie mit den Parametern $a$ Ort und $b$ Skala die Verteilung von $Z =(X-a)/b$ nicht von irgendwelchen Parametern abhängt und für jedes dazugehörige $X$ Familie.

Meine Frage ist also, ob Sie ein Beispiel liefern können, bei dem zwei Zufallszahlen aus derselben Verteilungsfamilie standardisiert sind, dies jedoch nicht zu einer Zufallsvariablen mit derselben Verteilung führt.

Angenommen, $X$ und $Y$ stammen aus derselben Verteilungsfamilie (wobei ich mit Familie zum Beispiel sowohl Normal als auch Gamma usw. meine). Definieren:

$Z_1 = \dfrac{X-\mu}{\sigma}$

$Z_2 = \dfrac{Y-\mu}{\sigma}$

wir wissen, dass sowohl $Z_1$ als auch $Z_2$ die gleiche Erwartung und Varianz haben, $\mu_Z =0, \sigma^2_Z =1$ .

Aber können sie unterschiedliche höhere Momente haben?

Mein Versuch, diese Frage zu beantworten, ist, dass wenn die Verteilung von $X$ und $Y$ von mehr als 2 Parametern abhängt, als es sein könnte. Und ich denke an das verallgemeinerte $t-student$ , das 3 Parameter hat.

Wenn jedoch die Anzahl der Parameter $\le2$ und $X$ und $Y$ aus derselben Verteilungsfamilie mit derselben Erwartung und Varianz stammen, bedeutet dies dann, dass $Z_1$ und $Z_2$ dieselbe Verteilung haben (höhere Momente)?

— gioxc88
quelle

4

Ja, sie können. In einer verallgemeinerten Verteilung würden Sie jedoch mindestens 3 Parameter benötigen.

— Carl

5

@Carl Ein Parameter wird ausreichen.

— whuber

5

@ Carl Es ist unklar, was Sie unter "gleicher Verteilung" verstehen. Wörtlich würde sich dies auf eine eindeutige Verteilung mit einem Gesetz und damit auf eine eindeutige Erwartung, eine eindeutige Varianz und eindeutige Momente beziehen (sofern sie definiert sind). Wenn Sie meinen , „gleiche Verteilung Familie “ , dann Ihre Bemerkung ist sinnlos, weil die Familie ist , was Sie definieren es zu sein.

— whuber

3

@HardCore Da Sie anscheinend das Gefühl haben, dass Ihre Frage beantwortet wurde, lesen Sie bitte Was soll ich tun, wenn jemand meine Frage beantwortet?

— Glen_b -State Monica

2

@ Carl Ich habe auch deine Antwort positiv bewertet. Die Verwendung des OP scheint die Vorstellung zu unterstützen, dass

Z = (X - a) / b

$Z=(X-a)/b$ für alle Auswahlmöglichkeiten von

X

$X$ in der Familie dieselbe Standardverteilung aufweist . Mal sehen, welche Antwort das OP akzeptiert (falls das OP jemals den Kommentar von Glen_b liest und darauf reagiert).

— Dilip Sarwate

7

Es gibt anscheinend einige Verwirrung darüber, was eine Familie von Verteilungen ist und wie freie Parameter gegenüber freien plus festen (zugewiesenen) Parametern gezählt werden. Diese Fragen sind eine Seite, die nichts mit der Absicht des OP und dieser Antwort zu tun hat. Ich verwende das Wort Familie hier nicht, weil es verwirrend ist. Beispielsweise ist eine Familie gemäß einer Quelle das Ergebnis der Variation des Formparameters. @whuber gibt an, dass eine "Parametrisierung" einer Familie eine kontinuierliche Abbildung von einer Teilmenge von ℝ mit ihrer üblichen Topologie in den Raum der Verteilungen ist, deren Bild diese Familie ist. $^n$ Ich werde die Wortform verwenden , die sowohl die beabsichtigte Verwendung des Wortes abdecktFamilien- und Parameteridentifikation und -zählung. Zum Beispiel hat die Formel $x^2-2x+4$ die Form einer quadratischen Formel, dh $a_2x^2+a_1x+a_0$ und wenn $a_1=0$ die Formel immer noch eine quadratische Form. Wenn jedoch $a_2=0$ Die Formel ist linear und die Form ist nicht mehr vollständig genug, um einen quadratischen Formterm zu enthalten. Diejenigen, die das Wort Familie in einem angemessenen statistischen Kontext verwenden möchten, werden aufgefordert, zu dieser separaten Frage beizutragen .

Beantworten wir die Frage "Können sie unterschiedliche höhere Momente haben?". Es gibt viele solcher Beispiele. Wir stellen nebenbei fest, dass es sich anscheinend um symmetrische PDFs handelt, bei denen es sich im einfachen Fall mit zwei Parametern tendenziell um Position und Skalierung handelt. Die Logik: Angenommen, es gibt zwei Dichtefunktionen mit unterschiedlichen Formen mit zwei identischen Parametern (Position, Skalierung). Dann gibt es entweder einen Formparameter, der die Form anpasst, oder die Dichtefunktionen haben keinen gemeinsamen Formparameter und sind somit Dichtefunktionen ohne gemeinsame Form.

Hier ist ein Beispiel dafür, wie der Formparameter darin dargestellt wird. Die verallgemeinerte Fehlerdichtefunktion und hier ist eine Antwort, die eine frei wählbare Kurtosis zu haben scheint.

Von Skbkekas - Eigene Arbeit, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6057753

Die PDF-Funktion (AKA "Wahrscheinlichkeits" -Dichtefunktion, beachten Sie, dass das Wort "Wahrscheinlichkeit" überflüssig ist) ist

\frac{β}{2 α Γ (\frac{1}{β})} e^{- (\frac{| x - μ |}{α})^{β}}

$\dfrac{\beta}{2\alpha\Gamma\Big(\dfrac{1}{\beta}\Big)} \; e^{-\Big(\dfrac{|x-\mu|}{\alpha}\Big)^\beta}$

Der Mittelwert und die Position sind $\mu$ , die Skala ist $\alpha$ und $\beta$ ist die Form. Beachten Sie, dass es einfacher ist, symmetrische PDFs darzustellen, da diese PDFs häufig Position und Skalierung als einfachste zwei Parameterfälle haben, während asymmetrische PDFs wie die Gamma-PDF-Datei tendenziell Form und Skalierung als einfachste Fallparameter haben. In Fortsetzung der Fehlerdichtefunktion beträgt die Varianz $\dfrac{\alpha^2\Gamma\Big(\dfrac{3}{\beta}\Big)}{\Gamma\Big(\dfrac{1}{\beta}\Big)}$ ist die Schiefe $0$ und die Kurtosis ist $\dfrac{\Gamma\Big(\dfrac{5}{\beta}\Big)\Gamma\Big(\dfrac{1}{\beta}\Big)}{\Gamma\Big(\dfrac{3}{\beta}\Big)^2}-3$ . Wenn wir also die Varianz auf 1 setzen, weisen wir den Wert von $\alpha$ aus $\alpha ^2=\dfrac{\Gamma \left(\dfrac{1}{\beta }\right)}{\Gamma \left(\dfrac{3}{\beta }\right)}$ unter Variation von $\beta>0$ , so dass die Kurtosis im Bereich von $-0.601114$ bis $\infty$ wählbar ist.

Das heißt, wenn wir Momente höherer Ordnung variieren wollen und wenn wir einen Mittelwert von Null und eine Varianz von 1 beibehalten wollen, müssen wir die Form variieren. Dies impliziert drei Parameter, die im Allgemeinen 1) den Mittelwert oder auf andere Weise das geeignete Maß für den Ort, 2) die Skala zur Anpassung der Varianz oder ein anderes Maß für die Variabilität und 3) die Form sind. Es braucht mindestens drei Parameter, um es zu tun.

Beachten Sie, dass, wenn wir die Substitutionen $\beta=2$ vornehmen , $\alpha=\sqrt{2}\sigma$ im obigen PDF erhalten wir

\frac{e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}}{\sqrt{2 π} σ},

$\frac{e^{-\frac{(x-\mu )^2}{2 \sigma ^2}}}{\sqrt{2 \pi } \sigma }\;,$

Dies ist die Dichtefunktion einer Normalverteilung. Somit ist die verallgemeinerte Fehlerdichtefunktion eine Verallgemeinerung der Dichtefunktion der Normalverteilung. Es gibt viele Möglichkeiten, die Dichtefunktion einer Normalverteilung zu verallgemeinern. Ein weiteres Beispiel, aber mit der Dichtefunktion der Normalverteilung nur als Grenzwert, und nicht mit Mid-Range - Substitutionswerte wie die verallgemeinerte Fehlerdichtefunktion, ist der Student $-t$ ‚s Dichtefunktion. Der unter Verwendung des Studenten $-t$ Dichtefunktion, würden wir eine etwas eingeschränkte Auswahl an Kurtosis haben, und $\textit{df}\geq2$ ist der Formparameter , da der zweite Moment nicht vorhanden $\textit{df}<2$ . Darüber hinaus ist df nicht auf positive ganzzahlige Werte beschränkt, sondern im Allgemeinen real $\geq1$ . Das Student's $-t$ nur im Limit als $\textit{df}\rightarrow\infty$ normal , weshalb ich es nicht als Beispiel gewählt habe. Es ist weder ein gutes Beispiel noch ein Gegenbeispiel, und darin stimme ich @ Xi'an und @whuber nicht zu.

Lassen Sie mich das weiter erklären. Man kann zwei von vielen willkürlichen Dichtefunktionen zweier Parameter auswählen, um beispielsweise einen Mittelwert von Null und eine Varianz von Eins zu haben. Sie werden jedoch nicht alle dieselbe Form haben. Die Frage bezieht sich jedoch auf Dichtefunktionen der gleichen Form, nicht auf verschiedene Formen. Es wurde behauptet, dass die Dichtefunktionen, die dieselbe Form haben, eine willkürliche Zuordnung sind, da dies eine Frage der Definition ist und sich meine Meinung darin unterscheidet. Ich stimme nicht zu, dass dies willkürlich ist, weil man entweder eine Substitution vornehmen kann, um eine Dichtefunktion in eine andere umzuwandeln, oder man kann nicht. Im ersten Fall sind die Dichtefunktionen ähnlich, und wenn wir durch Substitution zeigen können, dass die Dichtefunktionen nicht äquivalent sind, dann haben diese Dichtefunktionen unterschiedliche Form.

So das Beispiel des Studenten mit $-t$ PDF, sind die Entscheidungen entweder betrachten es als eine Verallgemeinerung eines normalen PDF sein, in welchem Fall eine normale PDF eine zulässige Form für einen Student hat $-t$ ‚s PDF, oder nicht, in welchem Fall des Studenten $-t$ PDF s‘ist eine andere Form von der normalen PDF und ist somit irrelevant für die Frage gestellt .

Wir können dies auf viele Arten argumentieren. Meine Meinung ist , dass ein normales PDF ist ein Teil gewählte Form eines Schülers $-t$ ‚s PDF, sondern dass ein normales PDF ist keine Unter Auswahl eines Gamma - PDF obwohl ein Grenzwert eines Gamma PDF angezeigt werden kann Seien Sie ein normales PDF, und mein Grund dafür ist, dass im normalen / Student ' $-t$ Fall die Unterstützung dieselbe ist, aber im normalen / Gamma-Fall ist die Unterstützung unendlich gegenüber halb unendlich, was die erforderliche Inkompatibilität darstellt .

— Carl
quelle

6

(-1) Wie bereits in anderen Kommentaren erwähnt, lautet das Problem "Was bedeutet eine Vertriebsfamilie?". Ich kann leicht eine neue "Familie" von Verteilungen definieren, die einfach als t-Verteilungen neu skaliert werden, um einen Mittelwert = 0, sd = 1 mit einem einzigen Parameter zu haben: df. Dann sind der 1. und 2. Moment für alle df gleich, aber für unterschiedliche Werte von df haben sie unterschiedliche höhere Momente.

— Cliff AB

5

Hard Core, dieser Kommentar ist schwer zu ergründen, da Ihr Titel selbst das Wort "Familie" enthält! Wenn Sie darüber hinaus leugnen, dass eine Familie von Bedeutung ist, macht die Frage keinen Sinn. Bitte klären Sie dies, indem Sie Ihre Frage bearbeiten, um Ihre Absichten widerzuspiegeln.

— whuber

5

-1, weil Sie zunächst sagen: "Die Antwort lautet NEIN." und geben Sie dann ein Beispiel an, das effektiv mit Ja antwortet (ein weiteres Beispiel finden Sie in der Antwort von kjetilbhalvorsen, die Sie positiv erwähnen). Das ergibt für mich keinen Sinn. Ich denke, die Mathematik hier ist uns allen klar, daher ist meine Ablehnung nur wegen der mangelnden Konsistenz in der Präsentation.

— Amöbe sagt Reinstate Monica

3

Carl, es gibt eine starke Inkonsistenz zwischen der Frage und den Kommentaren von Hard Core. Die Frage ist explizit: "Geben Sie ein Beispiel, in dem zwei zufällige [Variablen] aus derselben Verteilungsfamilie standardisiert sind, dies jedoch nicht zu ... zufälligen Variablen [s] mit derselben Verteilung führt." Offensichtlich ist eine Bedeutung von "Familie" beabsichtigt. Die übliche Bedeutung ist klar, obwohl es verschiedene technische Varianten gibt, und die (leicht zu demonstrierende) richtige Antwort lautet "Ja, es gibt viele solcher Beispiele".

— whuber

3

Vielen Dank. Natürlich haben Sie eine gute Vorstellung davon, worüber Sie schreiben, aber leider verbreitet Ihr Beitrag einiges an Verwirrung darüber, was die Bedeutungen von "Verteilung", "Form", "Form" und "Parameter" sein könnten. Betrachten Sie als ein Beispiel für die Feinheiten eine Familie von Verteilungen, die durch ein Verteilungsgesetz

, das ein drittes zentrales Moment ungleich Null hat. Die Familie wird durch zwei reelle Zahlen

indiziert und besteht aus allen Gesetzen

. Es ist eine Familie auf Ortsskala, aber die Formen dieser Gesetze unterscheiden sich je nach Vorzeichen von

.

F

$F$

(μ, σ \neq 0)

$(\mu,\sigma\ne 0)$

x \to F (σ x + μ)

$x\to F(\sigma x+\mu)$

σ

$\sigma$

— whuber

17

Wenn Sie ein Beispiel wünschen, bei dem es sich um eine "offiziell benannte parametrisierte Verteilungsfamilie" handelt, können Sie sich die verallgemeinerte Gammaverteilung ansehen: https://en.wikipedia.org/wiki/Generalized_gamma_distribution . Diese Verteilungsfamilie verfügt über drei Parameter, sodass Sie den Mittelwert festlegen können und Varianz und haben immer noch die Freiheit, höhere Momente zu variieren. Auf der Wiki-Seite sieht die Algebra nicht einladend aus, ich würde es lieber numerisch machen. Für statistische Anwendungen durchsuchen Sie diese Site nach gamlss, einer Erweiterung von gam (generalisiertes Additiv) Modelle, an sich eine Verallgemeinerung von glm's), die Parameter für "Ort, Maßstab und Form" haben.

Another example is the $t$ -distributions, extended to be a location-scale family. Then the third parameter will be the degrees of freedom, which will wary the shape for a fixed location and scale.

— kjetil b halvorsen
quelle

1

Although the generalized error distribution may have been a better choice.

— Carl

2

Thank you very much for your answer!! I choose Carl's one because it was more detailed but this was fine too .. thank you very much !!!

— gioxc88

14

There is an infinite number of distributions with mean zero and variance one, hence take $\epsilon_1$ distributed from one of these distributions, say the $\mathcal{N}(0,1)$ , and $\epsilon_2$ from another of these distributions, say the Student's $t$ with 54 degrees of freedom rescaled by $\sqrt\frac{1}{3}$ so that its variance is one, then

X = μ + σ ϵ_{1} and Y = μ + σ ϵ_{2}

$X=\mu+\sigma\epsilon_1\qquad\text{and}\qquad Y=\mu+\sigma\epsilon_2$ enjoy the properties you mention. The "number" of parameters is irrelevant to the property.

Obviously, if you set further rules to the definition of this family, like stating for instance that there exists a fixed density $f$ such that the density of $X$ is

\frac{1}{σ^{d}} f ({x - μ} / σ)

$\frac{1}{\sigma^d} f(\{x-\mu\}/\sigma)$ you may end up with a single possible distribution.

— Xi'an
quelle

thank you for the answer but I think that this is not what I asked

— gioxc88

6

I think it does because if the family of distributions is defined by the reunion of both the distributions of the

X

$X$ 's and the

Y

$Y$ 's, then you have a contradiction to the property. A "family" of distributions is quite a vague notion.

— Xi'an

yes in fact is quite vague but if you read my question I wrote that in this context with family I mean for example both Normal or both Gamma and so on .. You made an example with one normal and one t student

— gioxc88

4

Hard Core, you seem to confuse the name of a family with its concept. This answer is a fine one and nicely illustrates the concept. Your question doesn't ask that the solution be a location-scale family. If you need it to be one, you can always take this answer--or any other answer--and prolong it to a location-scale family by allowing arbitrary translations and rescalings. Xi'an's point about the number of parameters still holds.

— whuber

@whuber I think it is confused as an answer. Student's-t by itself would be a better answer, rather than use the extreme answer of

d f = 3, \infty

$df=3,\infty$ and not specify it. Indeed, it is

d f

$df$ which is the third parameter.

— Carl

6

I think you are asking whether two random variables coming from the same location-scale family can have the same mean and variance, but at least one different higher moment. The answer is no.

Proof: Let $X_1$ and $X_2$ be two such random variables. Since $X_1$ and $X_2$ are in the same location-scale family, there exist a random variable $X$ and real numbers $a_1>0, a_2>0, b_1, b_2$ such that $X_1 \stackrel{d}{=} a_1 X + b_1$ and $X_2 \stackrel{d}{=} a_2 X + b_2$ . Since $X_1$ and $X_2$ have the same mean and variance, we have:

$E[X_1] = E[X_2] \implies a_1 E[X] + b_1 = a_2 E[X] + b_2$ .
$\operatorname{Var}[X_1] = \operatorname{Var}[X_2] \implies a_1^2 \operatorname{Var}[X] = a_2^2 \operatorname{Var}[X]$ .

If $\operatorname{Var}[X] = 0$ , then $X_1=E[X_1]=X_2=E[X_2]$ with probability $1$ , and hence the higher moments of $X_1$ and $X_2$ are all equal. So we may assume that $\operatorname{Var}[X] \neq 0$ . Using this, (2) implies that $|a_1|=|a_2|$ . Since $a_1>0$ and $a_2>0$ , we have in fact that $a_1=a_2$ . In turn, (1) above now implies that $b_1=b_2$ . We therefore have that:

E [X_{1}^{k}] = E [(a_{1} X + b_{1})^{k}] = E [(a_{2} X + b_{2})^{k}] = E [X_{2}^{k}],

$E[X_1^k] = E[(a_1X+b_1)^k] = E[(a_2X+b_2)^k] = E[X_2^k],$ for any

k

$k$ , i.e., all moments of

X_{1}

$X_1$ and

X_{2}

$X_2$ are all equal.

— yyzz
quelle

1

(+1) I cannot find fault with this answer. Apparently someone does, and they also find fault with mine. I do not understand this unexplained behaviour.

— Carl

5

@Carl This answer is incorrect--that's why it's being downvoted. Xi'an has already provided a counterexample.

— whuber

1

@whuber Please see my comments under Xi'an's answer. I do not agree with him but did not downvote because both he and you have a right to your opinion, even if I consider it to be incorrect.

— Carl

8

@Carl After re-reading this answer, I need to retract my original assessment: this answer is correct (and +1 for that), and it is correct because it clearly explains how it is interpreting the original question. (Specifically, there is a common yet narrow concept of a "location-scale family" as consisting of just a single standard distribution along with all its translates and positive rescalings.) I believe the original question was intended to ask something a little different; the basis of that belief is the reference to more than two parameters in the post.

— whuber

2

I am sorry if I have not been very clear and I thank you for the time you have spent for looking into this but that is not what I asked.

— gioxc88

1

Since the question can be interpreted in multipe ways I will split this answer into two parts.

A: distribution families.
B: location-scale distribution families.

The problem with case A can be easily answered/demonstrated by many families with a shape parameter.

The problem with case B is more difficult since one and a half parameters seem to be sufficient to specify location and scale (location in $\mathbb{R}$ and scale in $\mathbb{R_{>0}}$ ), and the problem becomes whether two parameters can be used to encode (multiple) shapes in addition as well. This is not so trivial. We can easily come up with specific two parameter location scale families and demonstrate that you do not have different shapes, but it does not proof that this is a fixed rule for any two parameter location scale family.

A: Can two different distributions from the same 2 parameter distribution family have the same mean and variance?

The answer is yes and it can already be shown using one of the explicitly mentioned examples: the normalized Gamma distribution

Family of normalized gamma distributions

Let $Z = \frac{X-\mu}{\sigma}$ with $X$ a Gamma distributed variable. The (cumulative) distribution of $Z$ is as below:

F_{Z} (z; k) = {\begin{cases} 0 & if & z < - \sqrt{k} \\ \frac{1}{Γ (k)} γ (k, z \sqrt{k} + k) & if & z \geq - \sqrt{k} \end{cases}

$F_Z(z;k) = \begin{cases} 0 & \quad \text{if} & z < -\sqrt{k}\\ \frac{1}{\Gamma(k)} \gamma(k, {z\sqrt{k}+k}) & \quad \text{if} & z \geq -\sqrt{k} \end{cases}$

where $\gamma$ is the incomplete gamma function.

So here it is clearly the case that different $Z_1$ and $Z_2$ (distributions from the family of normalized gamma distributions) can have same mean and variance (namely $\mu=0$ and $\sigma=1$ ) but differ based on the parameter $k$ (often denoted 'shape' parameter). This is closely linked to the fact that the family of gamma distributions is not a location-scale family.

B: Can two different distributions from the same 2 parameter location-scale distribution family have the same mean and variance?

I believe that the answer is no if we consider only smooth families (smooth: a small change in the parameters will result in a small change of the distribution/function/curve). But that answer is not so trivial and when we would use more general (non-smooth) families then we can say yes, although these families only exist in theory and have no practical relevance.

Generating a location-scale family from a single distribution by translation and scaling

From any particular single distribution we can generate a location-scale family by translation and scaling. If $f(x)$ is the probability density function of the single distribution, then the probability density function for a member of the family will be

f (x; μ, σ) = \frac{1}{σ} f (\frac{x - μ}{σ})

$f(x;\mu,\sigma) = \frac{1}{\sigma}f(\frac{x-\mu}{\sigma})$

For a location-scale family that can be generated in such way we have:

for any two members $f(x;\mu_1,\sigma_1)$ and $f(x;\mu_2,\sigma_2)$ if their means and variances are equal, then $f(x;\mu_1,\sigma_1) = f(x;\mu_2,\sigma_2)$

Can for all two parameter location-scale families their member distributions be generated from a single member distribution by translation and scaling?

So translation and scaling can convert a single distribution into a location-scale family. The question is whether the reverse is true and whether every two parameter location-scale family (where the parameters $\theta_1$ and $\theta_2$ do not necessarily need to coincide with the location $\mu$ and scale $\sigma$ ) can be described by a translation and scaling of a single member from that family.

For particular two parameter location-scale families like the family of normal distributions it is not too difficult to show that they can be generated according to the process above (scaling and translating of single example member).

One may wonder whether it is possible for every two parameter location-scale family to be generated out of a single member by translation and scaling. Or a conflicting statement: "Can a two parameter location-scale family contain two different member distributions with the same mean and variance?", for which it would be necessary that the family is a union of multiple subfamilies that are each generated by translation and scaling.

Case 1: Family of generalized Students' t-distributions, parameterized by two variables

A contrived example occurs when we make some mapping from $R^2$ into $R^3$ (cardinality-of-mathbbr-and-mathbbr2) which allows the freedom to use two parameters $\theta_1$ and $\theta_2$ to describe a union of multiple subfamilies that are generated by translation and scaling.

Let's use the (three parameter) generalized Student's t-distribution:

$f(x;\nu,\mu,\sigma) = \frac{\Gamma \left( \frac{\nu + 1}{2} \right) }{\Gamma \left( \frac{\nu}{2} \right) \sqrt{\pi\nu}\sigma} \left(1 + \frac{1}{\nu} \left( \frac{x-\mu}{\sigma} \right)^2 \right)^{-\frac{\nu+1}{2}}$

with the three parameters changed as following

\begin{array}{rcl} μ & = & \tan (θ_{1}) \\ σ & = & θ_{2} \\ ν & = & ⌊ 0.5 + θ_{1} / π ⌋ \end{array}

$\begin{array}{rcl} \mu &=& \tan (\theta_1)\\ \sigma &=& \theta_2\\ \nu &=& \lfloor 0.5+\theta_1/\pi \rfloor \end{array}$

then we have

$f(x;\theta_1,\theta_2) = \frac{\Gamma \left( \frac{\lfloor 0.5+\theta_1/\pi \rfloor + 1}{2} \right) }{\Gamma \left( \frac{\lfloor 0.5+\theta_1/\pi \rfloor}{2} \right) \sqrt{\pi\lfloor 0.5+\theta_1/\pi \rfloor}\theta_2} \left(1 + \frac{1}{\lfloor 0.5+\theta_1/\pi \rfloor} \left( \frac{x-\tan(\theta_1)}{\theta_2} \right)^2 \right)^{-\frac{\lfloor 0.5+\theta_1/\pi \rfloor+1}{2}}$

which may be considered a two parameter location-scale family (albeit not very useful) that can not be generated by translation and scaling of only a single member.

Case 2: Location-scale families generated by negative scaling of a single distribution with nonzero skew

A less contrived example, than using this tan-function, is given by Whuber under the comments of Carl's answer. We can have a family $x \mapsto f(x/b + a)$ where flipping the sign of $b$ keeps the mean and variance unchanged but possibly changing the uneven higher moments. So this gives a bit more easily a two parameter location-scale family where members with the same mean and variance can have different higher order moments. This example from Whuber can be split into two subfamilies each of which can be generated out of a single member by translation and scaling.

Smooth families

If we try to make a single smooth two parameter distribution family (smooth: a small change in the parameters will result in a small change of the distribution/function/curve) by somehow making a composition of two or more families that are generated by translation and scaling, then we get into problems to have the two parameters cover both the variation of 'mean' and 'variance', as well as the third parameter 'shape'. A formal proof will have to go along the same lines as the answer to the question: Is there a smooth surjective function $f:\mathbb{R}^2 \mapsto \mathbb{R}^3$ ? (where the answer is no in the case of smooth, ie. infinitely differentiable, functions although there are continuous functions that would do the job such as Peano curves).

Intuition: Imagine there would be some parameters $\theta_1$ , $\theta_2$ that describe the distributions in some location-scale distribution family and by which we can change the mean and variance as well as some other moments, then we should be able to express $\theta_1$ , $\theta_2$ , in terms of the mean $\mu$ and variance $\sigma$

\begin{array}{rcl} θ_{1} & = & f_{θ_{1}} (μ, σ) \\ θ_{2} & = & f_{θ_{2}} (μ, σ) \end{array}

$\begin{array}{rcl} \theta_1 &= &f_{\theta_1}(\mu,\sigma) \\ \theta_2 &=& f_{\theta_2}(\mu,\sigma)\end{array}$

but these need to be multiple valued functions and these can not make continuous transitions, the different values from $f_{\theta_1}(\mu,\sigma)$ for a particular $\mu$ and $\sigma$ are not continuous, and will not be able to model a continuous shape parameter.

I am actually not so sure about this final part. We could possibly use a space-filling curve (such as the Peano curve, if only we knew how to express coordinates on the curve to coordinates of the hypercube) to have a single parameter $\theta_1$ completely model multiple features like mean and variance, without giving up the property that a small change of the parameter $\theta_1$ is equivalent to a small change of the function $f(x;\theta_1)$ at every $x$

— Sextus Empiricus
quelle

1

I stopped reading after the initial definitions because they are so unclear and contradictory. By "integrate" you of course mean integration over $x$ only. By "

f,

$f,$ " though, you must mean the CDF and not the PDF, because the division by

b \neq 1

$b\ne 1$ changes the integral. By not imposing any restrictions on how

f

$f$ can vary with

θ

$\theta$ you also adopt a much broader concept of "family" than is usual. Only that allows you to discuss a "map from

R^{2}

$R^2$ to

R^{3} .

$R^3.$ " The problem with these "maps" is they cannot be continuous and will have no statistical meaning.

— whuber

2

I'm not objecting to simplicity or the language, but to the confusion that is being sown. The problem with your

R^{2} \to R^{3}

$R^2\to R^3$ map points out why you need to impose additional mathematical structure--a suitable topology--on the family. Allowing the distributions to change in such a (violently) discontinuous manner with

θ

$\theta$ is not only impractical and meaningless, it would likely invalidate useful methods and theorems for no good reason. For instance, MLE is almost always performed under the assumption that the distribution varies with

θ

$\theta$ in a piecewise differentiable manner.

— whuber

1

The second bullet is incorrect: it neither follows from any of the assumptions nor is it part of the definition of a location-scale family.

— whuber

1

It is tremendously confusing because now all references to the

θ_{i}

$\theta_i$ are superfluous. I believe the quantifiers now in your statement might not convey correctly the idea you have. Why not just drop the

θ_{i}

$\theta_i$ and simply state that the family consists of the set of distributions

x \to F (b x + a)

$x \to F(bx + a)$ for one given

F

$F$ and all

(a, b) \in R^{2}

$(a,b)\in\mathbb{R}^2$ with

b > 0

$b\gt 0$ ? There's no need to refer to means and variances, either--that's just a distraction from the essential idea, which does not require

F

$F$ to have any moments at all.

— whuber

1

@whuber if you are generating location-scale family from one single example then indeed it would seem like it is much easier to use

μ

$\mu$ and

σ

$\sigma$ . Here I am however imagining that we already have a family of curves parameterized by some alternative

θ_{1}

$\theta_1$ and

θ_{2}

$\theta_2$ and I wonder whether it could be possible that such a family contains more curves than just the curves created by scaling one member with

μ

$\mu$ and

σ

$\sigma$ (as in the transformation with the tangent). I will see if I can change the formulation somehow again (do you disagree with the idea or with the formulation?).

— Sextus Empiricus