EM-Maximum-Likelihood-Schätzung für die Weibull-Verteilung

24

Hinweis: Ich stelle eine Frage eines ehemaligen Studenten, der aus technischen Gründen nicht in der Lage ist, selbst zu posten.

Ausgehend von einer iid-Stichprobe $x_1,\ldots,x_n$ aus einer Weibull-Verteilung mit pdf

f_{k} (x) = k x^{k - 1} e^{- x^{k}} x > 0

$f_k(x) = k x^{k-1} e^{-x^k} \quad x>0$ gibt es eine nützliche fehlende Variablendarstellung und daher einen zugehörigen EM-Algorithmus (Erwartungsmaximierung), der sein könnte verwendet, um die MLE vonzu finden, anstatt eine einfache numerische Optimierung zu verwenden?

f_{k} (x) = \int_{Z} G_{k} (x, z) d z

$f_k(x) = \int_\mathcal{Z} g_k(x,z)\,\text{d}z$

k

$k$

— Xi'an
quelle

2

Gibt es eine Zensur?

— 14.

2

Was ist los mit Newton Rhapson?

— Wahrscheinlichkeitslogik

2

@probabilityislogic: mit irgendetwas stimmt nichts nicht! Mein Schüler würde gerne wissen, ob es eine EM-Version gibt, das ist alles ...

— Xi'an

1

Können Sie ein Beispiel dafür geben, wonach Sie in einem anderen, einfacheren Kontext suchen, z. B. mit Beobachtungen einer Gaußschen oder einer einheitlichen Zufallsvariablen? Wenn alle Daten beobachtet werden, sehe ich (und einige der anderen Poster, basierend auf ihren Kommentaren) nicht, wie relevant EM für Ihre Frage ist.

— Ahfoss

1

@ probabilityislogic Ich denke, du hättest sagen sollen: "Oh, du meinst, du willst Newton Raphson VERWENDEN?". Weibulls sind reguläre Familien ... Ich denke, ML-Lösungen sind einzigartig. Aus diesem Grund hat EM nichts zu "E", Sie müssen also nur "M" -en ... und das Finden von Wurzeln für Score-Gleichungen ist der beste Weg, dies zu tun!

— AdamO

7

Ich denke die Antwort ist ja, wenn ich die Frage richtig verstanden habe.

Schreibe . Dann wird eine EM - Algorithmus Art von Iteration, beginnend mit beispielsweise , ist $z_i = x_i^k$ $\hat k = 1$

E - ${\hat z}_i = x_i^{\hat k}$
M - $\hat k = \frac{n}{\left[\sum({\hat z}_i - 1)\log x_i\right]}$

Dies ist ein Sonderfall (der Fall ohne Zensur und ohne Kovariaten) der von Aitkin und Clayton (1980) für Weibull-Proportional-Hazards-Modelle vorgeschlagenen Iteration. Es ist auch in Abschnitt 6.11 von Aitkin et al. (1989) zu finden.

Aitkin, M. und Clayton, D., 1980. Die Anpassung von Exponential-, Weibull- und Extremwertverteilungen an komplexe zensierte Überlebensdaten unter Verwendung von GLIM. Applied Statistics , S. 156-163.
Aitkin, M., Anderson, D., Francis, B. und Hinde, J., 1989. Statistical Modeling in GLIM . Oxford University Press. New York.

— DavidF
quelle

Vielen Dank, David! Das Behandeln von

als die fehlende Variable ist mir nie in den Sinn gekommen ...!

x_{i}^{k}

$x_i^k$

— Xi'an

7

Das Weibull MLE ist nur numerisch lösbar:

Sei mit

f_{λ, β} (x) = {\begin{cases} \frac{β}{λ} {(\frac{x}{λ})}^{β - 1} e^{- {(\frac{x}{λ})}^{β}} & , x \geq 0 \\ 0 & , x < 0 \end{cases}

$f_{\lambda,\beta}(x) = \begin{cases} \frac{\beta}{\lambda}\left(\frac{x}{\lambda}\right)^{\beta-1}e^{-\left(\frac{x}{\lambda}\right)^{\beta}} & ,\,x\geq0 \\ 0 &,\, x<0 \end{cases}$

.

β, λ > 0

$\beta,\,\lambda>0$

1) Likelihoodfunction :

L_{\hat{x}} (λ, β) = \prod_{i = 1}^{N} f_{λ, β} (x_{i}) = \prod_{i = 1}^{N} \frac{β}{λ} {(\frac{x_{i}}{λ})}^{β - 1} e^{- {(\frac{x_{i}}{λ})}^{β}} = \frac{β^{N}}{λ^{N β}} e^{- \sum_{i = 1}^{N} {(\frac{x_{i}}{λ})}^{β}} \prod_{i = 1}^{N} x_{i}^{β - 1}

$\mathcal{L}_{\hat{x}}(\lambda, \beta) =\prod_{i=1}^N f_{\lambda,\beta}(x_i) =\prod_{i=1}^N \frac{\beta}{\lambda}\left(\frac{x_i}{\lambda}\right)^{\beta-1}e^{-\left(\frac{x_i}{\lambda}\right)^{\beta}} = \frac{\beta^N}{\lambda^{N \beta}} e^{-\sum_{i=1}^N\left(\frac{x_i}{\lambda}\right)^{\beta}} \prod_{i=1}^N x_i^{\beta-1}$

lügt-Likelihoodfunction :

ℓ_{\hat{x}} (λ, β) := \ln L_{\hat{x}} (λ, β) = N \ln β - N β \ln λ - \sum_{i = 1}^{N} {(\frac{x_{i}}{λ})}^{β} + (β - 1) \sum_{i = 1}^{N} \ln x_{i}

$\ell_{\hat{x}}(\lambda, \beta):= \ln \mathcal{L}_{\hat{x}}(\lambda, \beta)=N\ln \beta-N\beta\ln \lambda-\sum_{i=1}^N \left(\frac{x_i}{\lambda}\right)^\beta+(\beta-1)\sum_{i=1}^N \ln x_i$

2) MLE-Problem : 3) MaximierungumGradienten:

\begin{aligned} max_{(λ, β) \in R^{2}} & ℓ_{\hat{x}} (λ, β) \\ s.t. λ > 0 \\ β > 0 \end{aligned}

$\begin{equation*} \begin{aligned} & & \underset{(\lambda,\beta) \in \mathbb{R}^2}{\text{max}}\,\,\,\,\,\, & \ell_{\hat{x}}(\lambda, \beta) \\ & & \text{s.t.} \,\,\, \lambda>0\\ & & \beta > 0 \end{aligned} \end{equation*}$

0

$0$

\begin{aligned} \frac{\partial l}{\partial λ} & = - N β \frac{1}{λ} + β \sum_{i = 1}^{N} x_{i}^{β} \frac{1}{λ^{β + 1}} & \overset{!}{=} 0 \\ \frac{\partial l}{\partial β} & = \frac{N}{β} - N \ln λ - \sum_{i = 1}^{N} \ln (\frac{x_{i}}{λ}) e^{β \ln (\frac{x_{i}}{λ})} + \sum_{i = 1}^{N} \ln x_{i} & \overset{!}{=} 0 \end{aligned}

$\begin{align*} \frac{\partial l}{\partial \lambda}&=-N\beta\frac{1}{\lambda}+\beta\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta+1}}&\stackrel{!}{=} 0\\ \frac{\partial l}{\partial \beta}&=\frac{N}{\beta}-N\ln\lambda-\sum_{i=1}^N \ln\left(\frac{x_i}{\lambda}\right)e^{\beta \ln\left(\frac{x_i}{\lambda}\right)}+\sum_{i=1}^N \ln x_i&\stackrel{!}{=}0 \end{align*}$ It follows:

\begin{aligned} - N β \frac{1}{λ} + β \sum_{i = 1}^{N} x_{i}^{β} \frac{1}{λ^{β + 1}} & = 0 \\ - β \frac{1}{λ} N + β \frac{1}{λ} \sum_{i = 1}^{N} x_{i}^{β} \frac{1}{λ^{β}} & = 0 \\ - 1 + \frac{1}{N} \sum_{i = 1}^{N} x_{i}^{β} \frac{1}{λ^{β}} & = 0 \\ \frac{1}{N} \sum_{i = 1}^{N} x_{i}^{β} & = λ^{β} \end{aligned}

$\begin{align*} -N\beta\frac{1}{\lambda}+\beta\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta+1}} &= 0\\\\ -\beta\frac{1}{\lambda}N +\beta\frac{1}{\lambda}\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta}} &= 0\\\\ -1+\frac{1}{N}\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta}}&=0\\\\ \frac{1}{N}\sum_{i=1}^N x_i^\beta&=\lambda^\beta \end{align*}$

\Rightarrow λ^{*} = {(\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{β^{*}})}^{\frac{1}{β^{*}}}

$\Rightarrow\lambda^*=\left(\frac{1}{N}\sum_{i=1}^N x_i^{\beta^*}\right)^\frac{1}{\beta^*}$

Plugging $\lambda^*$ into the second 0-gradient condition:

\begin{aligned} \Rightarrow β^{*} = {[\frac{\sum_{i = 1}^{N} x_{i}^{β^{*}} \ln x_{i}}{\sum_{i = 1}^{N} x_{i}^{β^{*}}} - \bar{\ln x}]}^{- 1} \end{aligned}

$\begin{align*} \Rightarrow \beta^*=\left[\frac{\sum_{i=1}^N x_i^{\beta^*}\ln x_i}{\sum_{i=1}^N x_i^{\beta^*}}-\overline{\ln x}\right]^{-1} \end{align*}$

This equation is only numerically solvable, e.g. Newton-Raphson algorithm. $\hat{\beta}^*$ can then be placed into $\lambda^*$ to complete the ML estimator for the Weibull distribution.

— emcor
quelle

11

Unfortunately, this does not appear to answer the question in any discernible way. The OP is very clearly aware of Newton-Raphson and related approaches. The feasibility of N-R in no way precludes the existence of a missing-variable representation or associated EM algorithm. In my estimation, the question is not concerned at all with numerical solutions, but rather is probing for insight that might become apparent if an interesting missing-variable approach were demonstrated.

— cardinal

@cardinal It is one thing to say there was only numerical solution, and it is another thing to show there is only numerical solution.

— emcor

5

Dear @emcor, I think you may be misunderstanding what the question is asking. Perhaps reviewing the other answer and associated comment stream would be helpful. Cheers.

— cardinal

@cardinal I agree it is not direct answer, but it is the exact expressions for the MLE's e.g. can be used to verify the EM.

— emcor

4

Though this is an old question, it looks like there is an answer in a paper published here: http://home.iitk.ac.in/~kundu/interval-censoring-REVISED-2.pdf

In this work the analysis of interval-censored data, with Weibull distribution as the underlying lifetime distribution has been considered. It is assumed that censoring mechanism is independent and non-informative. As expected, the maximum likelihood estimators cannot be obtained in closed form. In our simulation experiments it is observed that the Newton-Raphson method may not converge many times. An expectation maximization algorithm has been suggested to compute the maximum likelihood estimators, and it converges almost all the times.

— user3204720
quelle

1

Can you post a full citation for the paper at the link, in case it goes dead?

— gung - Reinstate Monica

1

This is an EM algorithm, but does not do what I believe the OP wants. Rather, the E-step imputes the censored data, after which the M-step uses a fixed point algorithm with the complete data set. So the M-step is not in closed form (which I think is what the OP is looking for).

— Cliff AB

1

@CliffAB: thank you for the link (+1) but indeed the EM is naturally induced in this paper by the censoring part. My former student was looking for a plain uncensored iid Weibull likelihood optimisation via EM.

— Xi'an

-1

In this case the MLE and EM estimators are equivalent, since the MLE estimator is actually just a special case of the EM estimator. (I am assuming a frequentist framework in my answer; this isn't true for EM in a Bayesian context in which we're talking about MAP's). Since there is no missing data (just an unknown parameter), the E step simply returns the log likelihood, regardless of your choice of $k^{(t)}$ . The M step then maximizes the log likelihood, yielding the MLE.

EM would be applicable, for example, if you had observed data from a mixture of two Weibull distributions with parameters $k_1$ and $k_2$ , but you didn't know which of these two distributions each observation came from.

— ahfoss
quelle

6

I think you may have misinterpreted the point of the question, which is: Does there exist some missing-variable interpretation from which one would obtain the given Weibull likelihood (and which would allow an EM-like algorithm to be applied)?

— cardinal

4

The question statement in @Xi'an's post is quite clear. I think the reason it hasn't been answered is because any answer is likely nontrivial. (It's interesting, so I wish I had more time to think about it.) At any rate, your comment appears to betray a misunderstanding of the EM algorithm. Perhaps the following will serve as an antidote:

— cardinal

6

Let

f (x) = π φ (x - μ_{1}) + (1 - π) φ (x - μ_{2})

$f(x) = \pi \varphi(x-\mu_1) + (1-\pi) \varphi(x-\mu_2)$ where

φ

$\varphi$ is the standard normal density function. Let

F (x) = \int_{- \infty}^{x} f (u) d u

$F(x) = \int_{-\infty}^x f(u)\,\mathrm{d}u$ . With

U_{1}, \dots, U_{n}

$U_1,\ldots,U_n$ iid standard uniform, take

X_{i} = F^{- 1} (U_{i})

$X_i = F^{-1}(U_i)$ . Then,

X_{1}, \dots, X_{n}

$X_1,\ldots,X_n$ is a sample from a Gaussian mixture model. We can estimate the parameters by (brute-force) maximum likelihood. Is there any missing data in our data-generation process? No. Does it have a latent-variable representation allowing for the use of an EM algorithm? Yes, absolutely.

— cardinal

4

Ich entschuldige mich @ cardinal; Ich glaube, ich habe zwei Dinge über Ihren letzten Beitrag falsch verstanden. Ja, im GMM-Problem könnten Sie suchen

R^{2} \times [0, 1]

$\mathbb{R}^2 \times [0,1]$ über einen Brute-Force-ML-Ansatz. Außerdem sehe ich jetzt, dass das ursprüngliche Problem nach einer Lösung sucht, bei der eine latente Variable eingeführt wird, die einen EM-Ansatz zur Schätzung des Parameters ermöglicht

k

$k$ in der gegebenen Dichte

k x^{k - 1} e^{- x^{k}}

$k x^{k-1}e^{-x^k}$ . Ein interessantes Problem. Gibt es Beispiele für die Verwendung von EM in einem so einfachen Kontext? Der größte Teil meiner EM-Exposition war im Zusammenhang mit Mischungsproblemen und Datenimputation zu sehen.

— ahfoss

3

@ahfoss: (+1) to your latest comment. Yes! You got it. As for examples: (i) it shows up in censored data problems, (ii) classical applications like hidden Markov models, (iii) simple threshold models like probit models (e.g., imagine observing the latent

Z_{i}

$Z_i$ instead of Bernoulli

X_{i} = 1_{(Z_{i} > μ)}

$X_i = \mathbf{1}_{(Z_i > \mu)}$ ), (iv) estimating variance components in one-way random effects models (and much more complex mixed models), and (v) finding the posterior mode in a Bayesian hierarchical model. The simplest is probably (i) followed by (iii).

— cardinal