Ein Problem bei der Schätzbarkeit von Parametern

Sei $Y_1,Y_2,Y_3$ und $Y_4$ vier Zufallsvariablen, so dass $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$ , wobei $\theta_1,\theta_2,\theta_3$ unbekannte Parameter sind. Man nehme auch andass $Var(Y_i)=\sigma^2$ , $i=1,2,3,4.$ Dann welche wahr ist?

A. $\theta_1,\theta_2,\theta_3$ thgr sind abschätzbar.

B. $\theta_1+\theta_3$ ist abschätzbar.

C. $\theta_1-\theta_3$ ist abschätzbar und $\dfrac{1}{2}(Y_1+Y_3)$ ist die beste lineare unverzerrte Schätzung von $\theta_1-\theta_3$ .

D. $\theta_2$ ist abschätzbar.

Die Antwort lautet C, was für mich seltsam aussieht (weil ich D habe).

Warum habe ich D? Da ist $E(Y_2-Y_4)=2\theta_2$ .

Warum verstehe ich nicht, dass C eine Antwort sein könnte? Ok, ich kann sehen, $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ ist ein unverzerrter Schätzer von $\theta_1-\theta_3$ ;, und seine Varianz ist kleiner als $\dfrac{Y_1+Y_3}{2}$ .

Bitte sag mir, wo ich falsch mache.

Auch hier gepostet: /math/2568894/a-problem-on-estimability-of-parameters

self-study estimation inference

— Stat_prob_001
quelle

Setzen Sie einen self-studyTag ein oder jemand kommt vorbei und schließt Ihre Frage.

— Carl

@ Carl es ist geschafft, aber warum?

— Stat_prob_001

Dies sind die Regeln für die Site, nicht meine Regeln, Site-Regeln.

— Carl

Ist

Y_{1} \neq Y_{3}

$Y_1\neq Y_3$ ?

— Carl

@Carl Sie können folgendermaßen denken:

wobei

ein rv mit dem Mittelwert

und der Varianz

. Und

wobei

ist ein rv mit Mittelwert

und der Varianz

Y_{1} = θ_{1} - θ_{3} + ϵ_{1}

$Y_1=\theta_1-\theta_3+\epsilon_1$

ϵ_{1}

$\epsilon_1$

0

$0$

σ^{2}

$\sigma^2$

Y_{3} = θ_{1} - θ_{3} + ϵ_{3}

$Y_3=\theta_1-\theta_3+\epsilon_3$

ϵ_{3}

$\epsilon_3$

0

$0$

σ^{2}

$\sigma^2$

— Stat_prob_001

Antworten:

Diese Antwort betont die Überprüfung der Schätzbarkeit. Die Eigenschaft der minimalen Varianz ist von meiner sekundären Überlegung.

Fassen Sie zunächst die Informationen in Form einer Matrix eines linearen Modells wie folgt zusammen: wobei(um die Abschätzbarkeit zu diskutieren, ist die Sphäritätsannahme nicht erforderlich. Um jedoch die Gauß-Markov-Eigenschaft zu diskutieren, müssen wir die Sphärität annehmen von).

\begin{aligned} (1) & Y := [\begin{matrix} Y_{1} \\ Y_{2} \\ Y_{3} \\ Y_{4} \end{matrix}] = [\begin{matrix} 1 & 0 & - 1 \\ 1 & 1 & - 1 \\ 1 & 0 & - 1 \\ 1 & - 1 & - 1 \end{matrix}] [\begin{matrix} θ_{1} \\ θ_{2} \\ θ_{3} \end{matrix}] + [\begin{matrix} ε_{1} \\ ε_{2} \\ ε_{3} \\ ε_{4} \end{matrix}] := X β + ε, \end{aligned}

$\begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align}$

E (ε) = 0, Var (ε) = σ^{2} I

$E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$

ε

$\varepsilon$

Wenn die Designmatrix vollen Rang ist, dann ist die ursprünglichen Parameter hat eine eindeutige Least-Squares - Schätzung . Folglich jeder Parameter , als eine lineare Funktion definiert von ist , in dem Sinne , dass es schätzen eindeutig durch die Daten über die kleinsten Quadrate geschätzt werden , abzuschätzen als . $X$ $\beta$ $\hat{\beta} = (X'X)^{-1}X'Y$ $\phi$ $\phi(\beta)$ $\beta$ $\hat{\beta}$ $\hat{\phi} = p'\hat{\beta}$

Die Subtilität entsteht, wenn nicht den vollen Rang hat. Um eine eingehende Erörterung zu ermöglichen, werden zunächst einige Notationen und Ausdrücke festgelegt (ich befolge die Konvention des koordinatenfreien Ansatzes für lineare Modelle , Abschnitt 4.8. Einige der Ausdrücke klingen unnötig technisch). Darüber hinaus gilt die Diskussion zu dem allgemeinen linearen Modell mit und . $X$ $Y = X\beta + \varepsilon$ $X \in \mathbb{R}^{n \times k}$ $\beta \in \mathbb{R}^k$

Eine Regressions Verteiler ist die Sammlung von Mittelwertvektoren als über variiert : $\beta$ $\mathbb{R}^k$ $M = {X β : β \in R^{k}} .$ $M = \{X\beta: \beta \in \mathbb{R}^k\}.$

Eine parametrische Funktion ist eine lineare Funktion von , $\phi = \phi(\beta)$ $\beta$ $ϕ (β) = p^{'} β = p_{1} β_{1} + \dots + p_{k} β_{k} .$ $\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$

Wie oben erwähnt, ist, wenn , nicht jede parametrische Funktion abschätzbar. Aber warten Sie, wie ist die Definition des Begriffs technisch abschätzbar ? Es scheint schwierig zu sein, eine klare Definition zu geben, ohne sich um eine kleine lineare Algebra zu kümmern. Eine Definition, die ich für die intuitivste halte, lautet wie folgt (aus derselben oben genannten Referenz): $\text{rank}(X) < k$ $\phi(\beta)$

Definition 1. Eine parametrische Funktion ist abschätzbar, wenn sie durch eindeutigin dem Sinne bestimmt wird, dass wenn erfüllen . $\phi(\beta)$ $X\beta$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1,\beta_2 \in \mathbb{R}^k$ $X\beta_1 = X\beta_2$

Deutung. Die obige Definition legt fest, dass die Abbildung von der Regressions Mannigfaltigkeit auf den Parameterraum von eins zu eins sein muss, was garantiert ist, wenn (dh wenn selbst eins zu eins ist). Wenn , wissen wir, dass es so dass $M$ $\phi$ $\text{rank}(X) = k$ $X$ $\text{rank}(X) < k$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$ . Die vorstehende abschätzbare Definition schließt diejenigen strukturdefizienten parametrischen Funktionen aus, die selbst bei gleichem Wert für unterschiedlichen Werten führen , was auf natürliche Weise keinen Sinn ergibt . Andererseits erlaubt eine schätzbare parametrische Funktion den Fall mit , solange die Bedingung erfüllt ist. $M$ $\phi(\cdot)$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$

There are other equivalent conditions to check the estimability of a parametric functional given in the same reference, Proposition 8.4.

After such a verbose background introduction, let's come back to your question.

A. $\beta$ itself is non-estimable for the reason that $\text{rank}(X) < 3$ , which entails $X\beta_1 = X\beta_2$ with $\beta_1 \neq \beta_2$ . Although the above definition is given for scalar functionals, it is easily generalized to vector-valued functionals.

$\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ $\beta_1 = (0, 1, 0)'$ $\beta_2 = (1, 1, 1)'$ , which gives $X\beta_1 = X\beta_2$ but $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$

$\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ $X\beta_1 = X\beta_2$ trivially implies $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$ , i.e., $\phi_2(\beta_1) = \phi_2(\beta_2)$ .

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$ . Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$ , by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$ .

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is
$[\begin{matrix} 4 & 0 & - 4 \\ 0 & 2 & 0 \\ - 4 & 0 & 4 \end{matrix}] \hat{β} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & - 1 \\ - 1 & - 1 & - 1 & - 1 \end{matrix}] Y,$ $\begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation}$ which, after simplification, is $[\begin{matrix} ϕ (\hat{β}) \\ {\hat{θ}}_{2} / 2 \\ - ϕ (\hat{β}) \end{matrix}] = [\begin{matrix} \bar{Y} \\ (Y_{2} - Y_{4}) / 4 \\ - \bar{Y} \end{matrix}],$ $\begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation}$ i.e., $\phi(\hat{\beta}) = \bar{Y}$ .

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$ , regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$ , it can be reformulated as

\begin{matrix} (2) & E [Y] = X β, β \in R^{k} . \end{matrix}

$\begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}$

It is such a simple model that we only specified the first moment form of the response vector $Y$ . When $\text{rank}(X) = k$ , model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$ .).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$ , how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$ , by definition, this is $\phi_1 = \phi_2$ , hence model $(3)$ is identifiable when indexing with $\phi$ . Conversely, suppose model $(3)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$ , which is trivially $\phi_1(\beta) = \phi_2(\beta)$ .

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows

E [Y] = [\begin{matrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} ϕ_{2} \\ ϕ_{3} \end{matrix}] = \tilde{X} γ .

$\begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}$

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

— Zhanxiong
quelle

If you need a proof for the second part of option C, I will supplement my answer.

— Zhanxiong

thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not

\frac{1}{4} (Y_{1} + Y_{2} + Y_{3} + Y_{4})

$\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"?

— Stat_prob_001

Oh, I don't know why I thought it is the estimator in C. Actually

(Y_{1} + Y_{2} + Y_{3} + Y_{4}) / 4

$(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer

— Zhanxiong

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$ . We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form

t_{λ} (Y) = \sum_{i = 1}^{4} λ_{i} Y_{i}

$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$ for constants

λ = (λ_{i})

$\lambda = (\lambda_i)$ .

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$ . By linearity of expectation,

\begin{aligned} θ_{1} - θ_{3} & = E [t_{λ} (Y)] = \sum_{i = 1}^{4} λ_{i} E [Y_{i}] \\ = λ_{1} (θ_{1} - θ_{3}) + λ_{2} (θ_{1} + θ_{2} - θ_{3}) + λ_{3} (θ_{1} - θ_{3}) + λ_{4} (θ_{1} - θ_{2} - θ_{3}) \\ = (λ_{1} + λ_{2} + λ_{3} + λ_{4}) (θ_{1} - θ_{3}) + (λ_{2} - λ_{4}) θ_{2} . \end{aligned}

$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$

Comparing coefficients of the unknown quantities $\theta_i$ reveals

\begin{matrix} (1) & λ_{2} - λ_{4} = 0 and λ_{1} + λ_{2} + λ_{3} + λ_{4} = 1. \end{matrix}

$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

Var (t_{λ}) = \sum_{i = 1}^{4} λ_{i}^{2} Var (Y_{i}) + \sum_{i \neq j}^{4} λ_{i} λ_{j} Cov (Y_{i}, Y_{j}) .

$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

\begin{matrix} (2) & Var (t_{λ}) = σ^{2} (λ_{1}^{2} + λ_{2}^{2} + λ_{3}^{2} + λ_{4}^{2}) . \end{matrix}

$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$ .

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$ . All we have to do is minimize $(2)$ , which can be written

σ^{2} (λ_{1}^{2} + λ_{2}^{2} + λ_{3}^{2} + λ_{4}^{2}) = \frac{σ^{2}}{4} (2 u^{2} + (2 v - 1)^{2} + 1) .

$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$

No constraints apply to $(u,v)$ . Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$ , it is now obvious that the unique solution is

λ = (λ_{1}, λ_{2}, λ_{3}, λ_{4}) = (1 / 4, 1 / 4, 1 / 4, 1 / 4) .

$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

θ_{2} = E [t_{(0, 1 / 2, 0, - 1 / 2)} (Y)]

$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

— whuber
quelle