Bei welchen Modellen fällt die Vorspannung von MLE schneller als die Varianz?

$\hat\theta$ $\theta^*$ $n$ $\lVert\hat\theta-\theta^*\rVert$ $O(1/\sqrt n)$ $\lVert \mathbb E\hat\theta - \theta^*\rVert$ $\lVert \mathbb E\hat\theta - \hat\theta\rVert$ $O(1/\sqrt{n})$

Ich interessiere mich für Modelle mit einer Abweichung , die schneller als schrumpft, bei der der Fehler jedoch nicht schneller schrumpft, da die Abweichung immer noch als schrumpft . Insbesondere würde ich gerne ausreichende Bedingungen kennen, unter denen die Abweichung eines Modells mit der Rate schrumpft . $O(1/\sqrt n)$ $O(1/\sqrt n)$ $O(1/n)$

— Mike Izbicki
quelle

Ist ? Oder?

∥θ^−θ∗∥=(θ^−θ∗)2 $\lVert\hat\theta-\theta^*\rVert = (\hat\theta-\theta^*)^2$

— Alecos Papadopoulos

Ich habe speziell nach der L2-Norm gefragt, ja. Aber ich würde mich auch für andere Normen interessieren, wenn dies die Beantwortung der Frage erleichtert.

— Mike Izbicki

(θ^−θ∗)2 $(\hat \theta -\theta^*)^2$ ist .

Op(1/n) $O_p(1/n)$

— Alecos Papadopoulos

Entschuldigung, ich habe Ihren Kommentar falsch verstanden. Für die L2-Norm in Dimensionen ist , und daher ist die Konvergenz bei der Rate von

d $d$

∥a−b∥=∑di=1(ai−bi)2−−−−−−−−−−−−√ $\Vert a-b\Vert = \sqrt{\sum_{i=1}^d (a_i-b_i)^2}$

O(1/n−−√) $O(1/\sqrt n)$ . Ich bin damit einverstanden, dass wenn wir es quadrieren, es als

konvergieren würdeO(1/n) $O(1/n)$ .

— Mike Izbicki

Haben Sie die Zeitschrift Ridge Regression (Hoerl & Kennard 1970) gesehen? Ich glaube, es gibt Bedingungen für die Entwurfsmatrix + Strafe, wo dies erwartet wird.

— dcl

Antworten:

Im Allgemeinen benötigen Sie Modelle, bei denen die MLE nicht asymptotisch normal ist, sondern zu einer anderen Verteilung konvergiert (und dies schneller). Dies geschieht normalerweise, wenn sich der zu schätzende Parameter an der Grenze des Parameterraums befindet. Intuitiv bedeutet dies, dass sich der MLE dem Parameter "nur von einer Seite" nähert, wodurch sich die Konvergenzgeschwindigkeit "verbessert", da er nicht durch "Hin- und Herbewegen" des Parameters "abgelenkt" wird.

Ein Standardbeispiel ist die MLE für in einer iid-Stichprobe von gleichförmigen rvs. Die MLE ist hier die Statistik maximaler Ordnung. $\theta$ $U(0,\theta)$

θ^n = u (n)

$\hat \theta_n = u_{(n)}$

Seine endliche Probenverteilung ist

F θ^n = ( θ ^ n ) n θ n, f θ^= n ( θ ^ n ) n - 1 θ n

$F_{\hat \theta_n} = \frac {(\hat \theta_n)^n}{\theta ^n},\;\;\; f_{\hat \theta}=n\frac {(\hat \theta_n)^{n-1}}{\theta ^n}$

E (θ^n) = n n + 1 θ ⟹ B (θ^) = - 1 n + 1 θ

$\mathbb E(\hat \theta_n) = \frac {n}{n+1}\theta \implies B(\hat \theta) = -\frac {1}{n+1}\theta$

So . Die gleiche erhöhte Rate gilt jedoch auch für die Varianz. $B(\hat \theta_n) = O(1/n)$

Man kann auch überprüfen , dass eine Grenzverteilung zu erhalten, müssen wir die Variable aussehen , ( das heißt , wir müssen Skala von ) da $n(\theta - \hat \theta_n)$ $n$

P [n (θ - θ^n) \leq z] = 1 - P [θ^n \leq θ - (z / n)]

$P[n(\theta - \hat \theta_n)\leq z] = 1-P[\hat \theta_n\leq \theta - (z/n)]$

= 1 - 1 θ n \cdot (θ + - z n) n = 1 - θ n θ n \cdot (1 + - z / θ n) n

$=1-\frac 1 {\theta^n}\cdot \left(\theta + \frac{-z}{n}\right)^n = 1-\frac {\theta^n} {\theta^n}\cdot \left(1 + \frac{-z/\theta}{n}\right)^n$

\to 1 - e - z / θ

$\to 1- e^{-z/\theta}$

Das ist die CDF der Exponentialverteilung.

Ich hoffe, das gibt eine Richtung vor.

— Alecos Papadopoulos
quelle

Dies rückt näher, aber ich interessiere mich speziell für Situationen, in denen die Abweichung schneller abnimmt als die Varianz.

— Mike Izbicki

@MikeIzbicki Hmm ... die Bias-Konvergenz hängt vom ersten Moment der Verteilung ab, und die (Quadratwurzel der) Varianz ist ebenfalls eine Größe "erster Ordnung". Ich bin mir dann nicht sicher, ob dies möglich ist, da es den Anschein hat, dass die Momente der Grenzverteilung mit Konvergenzraten "auftauchen", die nicht miteinander kompatibel sind ... Ich werde es mir aber überlegen.

— Alecos Papadopoulos

Nach den Kommentaren in meiner anderen Antwort (und dem Titel der OP-Frage!) Folgt hier eine nicht sehr strenge theoretische Untersuchung des Problems.

Wir wollen , um zu bestimmen , ob Bias unterschiedliche Konvergenzrate als die Quadratwurzel der Varianz aufweisen kann, $B(\hat \theta_n) = E(\hat \theta_n) - \theta$

B (θ^n) = O (1 / n δ), Var (θ^n) - - - - - - - \sqrt = O (1 / n γ), γ \neq δ ? ? ?

$B(\hat \theta_n) = O(1/n^{\delta}),\;\;\; \sqrt {\text{Var}(\hat \theta_n)} = O(1/n^{\gamma}), \;\;\gamma \neq \delta \;???$

Wir haben

B (θ^n) = O (1 / n δ) ⟹ lim n δ E (θ^n) < K ⟹ lim n 2 δ [E (θ^n)] 2 < K'

$B(\hat \theta_n) = O(1/n^{\delta}) \implies \lim n^{\delta}\mathbb E(\hat \theta_n) < K \implies \lim n^{2\delta}[\mathbb E(\hat \theta_n)]^2 < K'$

⟹ [E (θ^n)] 2 = O (1 / n 2 δ) (1)

$\implies [\mathbb E(\hat \theta_n)]^2 = O(1/n^{2\delta}) \tag{1}$

während

Var (θ^n) - - - - - - - \sqrt = O (1 / n γ) ⟹ lim n γ E (θ^2 n) - [E (θ^n)] 2 - - - - - - - - - - - - - \sqrt < M

$\sqrt {\text{Var}(\hat \theta_n)} = O(1/n^{\gamma}) \implies \lim n^{\gamma}\sqrt{\mathbb E (\hat \theta_n^2) - [\mathbb E(\hat \theta_n)]^2 }<M$

⟹ lim n 2 γ E (θ^2 n) - n 2 γ [E (θ^n)] 2 - - - - - - - - - - - - - - - - - - \sqrt < M

$\implies \lim \sqrt{n^{2\gamma}\mathbb E (\hat \theta_n^2) - n^{2\gamma}[\mathbb E(\hat \theta_n)]^2 }<M$

⟹ lim n 2 γ E (θ^2 n) - lim n 2 γ [E (θ^n)] 2 < M' (2)

$\implies \lim n^{2\gamma}\mathbb E (\hat \theta_n^2) - \lim n^{2\gamma}[\mathbb E(\hat \theta_n)]^2 < M' \tag{2}$

We see that $(2)$ may hold happen if

A) both components are $O(1/n^{2\gamma})$ , in which case we can only have $\gamma = \delta$ .

B) But it may also hold if

lim n 2 γ [E (θ^n)] 2 \to 0 ⟹ [E (θ^n)] 2 = o (1 / n 2 γ) (3)

$\lim n^{2\gamma}[\mathbb E(\hat \theta_n)]^2 \to 0 \implies [\mathbb E(\hat \theta_n)]^2 = o(1/n^{2\gamma}) \tag{3}$

For $(3)$ to be compatible with $(1)$ , we must have

n 2 γ < n 2 δ ⟹ δ > γ (4)

$n^{2\gamma} < n^{2\delta} \implies \delta > \gamma\tag {4}$

So it appears that in principle it is possible to have the Bias converging at a faster rate than the square root of the variance. But we cannot have the square root of the variance converging at a faster rate than the Bias.

— Alecos Papadopoulos
quelle

How would you reconcile this with the existence of unbiased estimators like ordinary least squares? In that case,

B(θ^)=0 $B(\hat\theta)=0$ , but

Var(θ^)−−−−−−√=O(1/n−−√) $\sqrt{Var(\hat\theta)} = O(1/\sqrt n)$ .

— Mike Izbicki

@MikeIzbicki Is the concept of convergence/big-O applicable in this case? Because here

B(θ^) $B(\hat \theta)$ is not "

O() $O()$ -anything" to begin with.

— Alecos Papadopoulos

In this case,

Eθ^=θ∗ $\mathbb E\hat\theta=\theta^*$ , so

B(θ^)=∥Eθ^−θ∗∥=0=O(1)=O(1/n0) $B(\hat\theta) = \lVert \mathbb E \hat\theta - \theta^*\rVert = 0 = O(1) = O(1/n^0)$ .

— Mike Izbicki

@MikeIzbicki But also

B(θ^)=O(n) $B(\hat \theta) = O(n)$ or

B(θ^)=O(1/n−−√) $B(\hat \theta) =O(1/\sqrt{n})$ or any other you care to write down. So which one is the rate of convergence here?

— Alecos Papadopoulos

@MikeIzbicki I have corrected my answer to show that it is possible in principle to have the Bias converging faster, although I still think the "zero-bias" example is problematic.

— Alecos Papadopoulos