Einflussfunktionen und OLS

13

Ich versuche zu verstehen, wie Einflussfunktionen funktionieren. Könnte jemand im Kontext einer einfachen OLS-Regression erklären

y_{i} = α + β \cdot x_{i} + ε_{i}

$\begin{equation} y_i = \alpha + \beta \cdot x_i + \varepsilon_i \end{equation}$

wo ich die Einflussfunktion für will $\beta$ .

regression least-squares

— stevejb
quelle

2

Hier gibt es noch keine spezifische Frage: Möchten Sie sehen, wie die Einflussfunktion berechnet wird? Möchten Sie ein konkretes empirisches Beispiel? Eine heuristische Erklärung dessen, was es bedeutet?

— whuber

1

Wenn Sie nach Frank Critchleys Artikel von 1986 "Einflussfunktionen in Hauptkomponenten" suchen (können Sie sich nicht an den genauen Namen des Papiers erinnern). Er definiert hier die Einflussfunktion für die gewöhnliche Regression (was meine falsche Antwort beweisen kann oder nicht).

— Wahrscheinlichkeitslogik

14

Einflussfunktionen sind im Grunde genommen ein Analysewerkzeug, mit dem der Effekt (oder "Einfluss") des Entfernens einer Beobachtung auf den Wert einer Statistik bewertet werden kann, ohne dass diese Statistik neu berechnet werden muss . Sie können auch verwendet werden, um asymptotische Varianzschätzungen zu erstellen. Wenn der Einfluss gleich ist, ist die asymptotische Varianz $I$ . $\frac{I^2}{n}$

Ich verstehe Einflussfunktionen wie folgt. Sie haben eine Art theoretische CDF, bezeichnet mit . Für einfaches OLS haben Sie $F_{i}(y)=Pr(Y_{i}<y_{i})$

Wobeidie normale Standard-CDF ist unddie Fehlervarianz ist. Jetzt können Sie zeigen, dass jede Statistik eine Funktion dieser CDF ist, daher die Notation(dh eine Funktion von). Nehmen wir nun an, wir ändern die Funktionum ein "kleines Bit" in

P r (Y_{i} < y_{i}) = P r (α + β x_{i} + ϵ_{i} < y_{i}) = Φ (\frac{y_{i} - (α + β x_{i})}{σ})

$Pr(Y_{i}<y_{i})=Pr(\alpha+\beta x_{i} + \epsilon_{i} < y_{i})=\Phi\left(\frac{y_{i}-(\alpha+\beta x_{i})}{\sigma}\right)$

Φ (z)

$\Phi(z)$

σ^{2}

$\sigma^2$

S (F)

$S(F)$

F

$F$

F

$F$

Wobei

F_{(i)} (z) = (1 + ζ) F (z) - ζ δ_{(i)} (z)

$F_{(i)}(z)=(1+\zeta)F(z)-\zeta \delta_{(i)}(z)$

und

δ_{i} (z) = I (y_{i} < z)

$\delta_{i}(z)=I(y_{i}<z)$

. Somitrepräsentiert

die CDF der Daten, wobei der "i-te" Datenpunkt entfernt ist. Wir können eine Taylorreihe von

über

. Das gibt:

ζ = \frac{1}{n - 1}

$\zeta=\frac{1}{n-1}$

F_{(i)}

$F_{(i)}$

F_{(i)} (z)

$F_{(i)}(z)$

ζ = 0

$\zeta=0$

S [F_{(i)} (z, ζ)] \approx S [F_{(i)} (z, 0)] + ζ [\frac{\partial S [F_{(i)} (z, ζ)]}{\partial ζ} |_{ζ = 0}]

$S[F_{(i)}(z,\zeta)] \approx S[F_{(i)}(z,0)]+\zeta\left[\frac{\partial S[F_{(i)}(z,\zeta)]}{\partial \zeta}|_{\zeta=0}\right]$

Note that $F_{(i)}(z,0)=F(z)$ so we get:

S [F_{(i)} (z, ζ)] \approx S [F (z)] + ζ [\frac{\partial S [F_{(i)} (z, ζ)]}{\partial ζ} |_{ζ = 0}]

$S[F_{(i)}(z,\zeta)] \approx S[F(z)]+\zeta\left[\frac{\partial S[F_{(i)}(z,\zeta)]}{\partial \zeta}|_{\zeta=0}\right]$

The partial derivative here is called the influence function. So this represents an approximate "first order" correction to be made to a statistic due to deleting the "ith" observation. Note that in regression the remainder does not go to zero asymtotically, so that this is an approximation to the changes you may actually get. Now write $\beta$ as:

β = \frac{\frac{1}{n} \sum_{j = 1}^{n} (y_{j} - \bar{y}) (x_{j} - \bar{x})}{\frac{1}{n} \sum_{j = 1}^{n} (x_{j} - \bar{x})^{2}}

$\beta=\frac{\frac{1}{n}\sum_{j=1}^{n}(y_{j}-\overline{y})(x_{j}-\overline{x})}{\frac{1}{n}\sum_{j=1}^{n}(x_{j}-\overline{x})^2}$

Somit ist Beta eine Funktion von zwei Statistiken: der Varianz von X und der Kovarianz zwischen X und Y. Diese beiden Statistiken haben Darstellungen in Bezug auf die CDF als:

c o v (X, Y) = \int (X - μ_{x} (F)) (Y - μ_{y} (F)) d F

$cov(X,Y)=\int(X-\mu_x(F))(Y-\mu_y(F))dF$

v a r (X) = \int (X - μ_{x} (F))^{2} d F

$var(X)=\int(X-\mu_x(F))^{2}dF$ where

μ_{x} = \int x d F

$\mu_x=\int xdF$

To remove the ith observation we replace $F\rightarrow F_{(i)}=(1+\zeta)F-\zeta \delta_{(i)}$ in both integrals to give:

μ_{x (i)} = \int x d [(1 + ζ) F - ζ δ_{(i)}] = μ_{x} - ζ (x_{i} - μ_{x})

$\mu_{x(i)}=\int xd[(1+\zeta)F-\zeta \delta_{(i)}]=\mu_x-\zeta(x_{i}-\mu_x)$

V a r (X)_{(i)} = \int (X - μ_{x (i)})^{2} d F_{(i)} = \int (X - μ_{x} + ζ (x_{i} - μ_{x}))^{2} d [(1 + ζ) F - ζ δ_{(i)}]

$Var(X)_{(i)}=\int(X-\mu_{x(i)})^{2}dF_{(i)}=\int(X-\mu_x+\zeta(x_{i}-\mu_x))^{2}d[(1+\zeta)F-\zeta \delta_{(i)}]$

ignoring terms of $\zeta^{2}$ and simplifying we get:

V a r (X)_{(i)} \approx V a r (X) - ζ [(x_{i} - μ_{x})^{2} - V a r (X)]

$Var(X)_{(i)}\approx Var(X)-\zeta\left[(x_{i}-\mu_x)^2-Var(X)\right]$ Similarly for the covariance

C o v (X, Y)_{(i)} \approx C o v (X, Y) - ζ [(x_{i} - μ_{x}) (y_{i} - μ_{y}) - C o v (X, Y)]

$Cov(X,Y)_{(i)}\approx Cov(X,Y)-\zeta\left[(x_{i}-\mu_x)(y_{i}-\mu_y)-Cov(X,Y)\right]$

So we can now express $\beta_{(i)}$ as a function of $\zeta$ . This is:

β_{(i)} (ζ) \approx \frac{C o v (X, Y) - ζ [(x_{i} - μ_{x}) (y_{i} - μ_{y}) - C o v (X, Y)]}{V a r (X) - ζ [(x_{i} - μ_{x})^{2} - V a r (X)]}

$\beta_{(i)}(\zeta)\approx \frac{Cov(X,Y)-\zeta\left[(x_{i}-\mu_x)(y_{i}-\mu_y)-Cov(X,Y)\right]}{Var(X)-\zeta\left[(x_{i}-\mu_x)^2-Var(X)\right]}$

We can now use the Taylor series:

β_{(i)} (ζ) \approx β_{(i)} (0) + ζ {[\frac{\partial β_{(i)} (ζ)}{\partial ζ}]}_{ζ = 0}

$\beta_{(i)}(\zeta)\approx \beta_{(i)}(0)+\zeta\left[\frac{\partial \beta_{(i)}(\zeta)}{\partial \zeta}\right]_{\zeta=0}$

Simplifying this gives:

β_{(i)} (ζ) \approx β - ζ [\frac{(x_{i} - μ_{x}) (y_{i} - μ_{y})}{V a r (X)} - β \frac{(x_{i} - μ_{x})^{2}}{V a r (X)}]

$\beta_{(i)}(\zeta)\approx \beta-\zeta\left[\frac{(x_{i}-\mu_x)(y_{i}-\mu_y)}{Var(X)}-\beta\frac{(x_{i}-\mu_x)^2}{Var(X)}\right]$

And plugging in the values of the statistics $\mu_y$ , $\mu_x$ , $var(X)$ , and $\zeta=\frac{1}{n-1}$ we get:

β_{(i)} \approx β - \frac{x_{i} - \bar{x}}{n - 1} [\frac{y_{i} - \bar{y}}{\frac{1}{n} \sum_{j = 1}^{n} (x_{j} - \bar{x})^{2}} - β \frac{x_{i} - \bar{x}}{\frac{1}{n} \sum_{j = 1}^{n} (x_{j} - \bar{x})^{2}}]

$\beta_{(i)}\approx \beta-\frac{x_{i}-\overline{x}}{n-1}\left[\frac{y_{i}-\overline{y}}{\frac{1}{n}\sum_{j=1}^{n}(x_{j}-\overline{x})^2}-\beta\frac{x_{i}-\overline{x}}{\frac{1}{n}\sum_{j=1}^{n}(x_{j}-\overline{x})^2}\right]$

And you can see how the effect of removing a single observation can be approximated without having to re-fit the model. You can also see how an x equal to the average has no influence on the slope of the line. Think about this and you will see how it makes sense. You can also write this more succinctly in terms of the standardised values $\tilde{x}=\frac{x-\overline{x}}{s_{x}}$ (similarly for y):

β_{(i)} \approx β - \frac{\tilde{x_{i}}}{n - 1} [\tilde{y_{i}} \frac{s_{y}}{s_{x}} - \tilde{x_{i}} β]

$\beta_{(i)}\approx \beta-\frac{\tilde{x_{i}}}{n-1}\left[\tilde{y_{i}}\frac{s_y}{s_x}-\tilde{x_{i}}\beta\right]$

— probabilityislogic
quelle

So the story is about the influence of additional data point? I more used to the impulse response for the time series data, in statistical context all influence would be described by marginal effect or (better choice) beta coefficient from standardized regression. Well I really need more context to judge the question and answer, but this one is nice, I think (+1 not yet but awaiting).

— Dmitrij Celov

@dmitrij - That is what was implied (or what I inferred) from the link - it is about the robustness properties of a statistic. Influence functions are slightly more general than 1 data point - you can redefine the delta function to be a sum of them (so many observations). I would think of it as a "cheap Jacknife" to some degree - because you don't require re-fitting of the model.

— probabilityislogic

9

Here is a super general way to talk about influence functions of a regression. First I'm going to tackle one way of presenting influence functions:

Suppose $F$ is a distribution on $\Sigma$ . The contaminated distribution function, $F_\epsilon(x)$ can be defined as:

F_{ϵ} (x) = (1 - ϵ) F + ϵ δ_{x}

$F_\epsilon(x)=(1-\epsilon)F+\epsilon\delta_x$ where

δ_{x}

$\delta_x$ is the probability measure on

Σ

$\Sigma$ which assigns probability 1 to

{x}

$\{x\}$ and 0 to all other elements of

Σ

$\Sigma$ .

From this we can define the influence function fairly easily:

The influence function of $\hat{\theta}$ at $F$ , $\psi_i:\mathcal{X}\to\Gamma$ is defined as:

ψ_{\hat{θ}, F} (x) = lim_{ϵ \to 0} \frac{\hat{θ} (F_{ϵ} (x)) - \hat{θ} (F)}{ϵ}

$\begin{equation} \psi_{\hat{\theta},F}(x)=\lim\limits_{\epsilon\to 0}\dfrac{\hat{\theta}(F_\epsilon(x))-\hat{\theta}(F)}{\epsilon} \end{equation}$

From here it's possible to see that an influence function is the Gateaux derivative of $\hat\theta$ at $F$ in the direction of $\delta_x$ . This makes the interpretation of influence functions (for me) a little bit clearer: An influence function tells you the effect that a particular observation has on the estimator.

The OLS estimate is a solution to the problem:

\hat{θ} = \arg min_{θ} E [(Y - X θ)^{T} (Y - X θ)]

$\hat\theta=\arg\min_\theta E[(Y-X\theta)^T(Y-X\theta)]$

Imagine a contaminated distribution which puts a little more weight on observation $(x,y)$ :

{\hat{θ}}_{ϵ} = \arg min_{θ} (1 - ϵ) E [(Y - X θ)^{T} (Y - X θ)] + ϵ (y - x θ)^{T} (y - x θ)

$\hat\theta_\epsilon = \arg\min_\theta (1-\epsilon)E[(Y-X\theta)^T(Y-X\theta)]+\epsilon (y-x\theta)^T(y-x\theta)$

Taking first order conditions:

{(1 - ϵ) E [X^{T} X] + ϵ x^{T} x} {\hat{θ}}_{ϵ} = (1 - ϵ) E [X^{T} Y] + ϵ x^{T} y

$\left\{(1-\epsilon)E[X^TX]+\epsilon x^Tx\right\}\hat\theta_\epsilon = (1-\epsilon)E[X^TY]+\epsilon x^Ty$

Since the influence function is just a Gateaux derivative we can now say:

- (E [X^{T} X] + x^{T} x) {\hat{θ}}_{ϵ} + E [X^{T} X] ψ_{θ} (x, y) = - E [X^{T} Y] + x^{T} y

$-(E[X^TX]+x^Tx)\hat\theta_\epsilon + E[X^TX]\psi_{\theta}(x,y) = -E[X^TY] + x^Ty$

At $\epsilon=0$ , $\hat\theta_\epsilon=\hat\theta=E[X^TX]^{-1}E[X^TY]$ , so:

ψ_{θ} (x, y) = E [X^{T} X]^{- 1} x^{T} (y - x θ)

$\psi_{\theta}(x,y)=E[X^TX]^{-1}x^T(y-x\theta)$

The finite sample counterpart of this influence function is:

ψ_{θ} (x, y) = {(\frac{1}{N} \sum_{i} X_{i}^{T} X_{i})}^{- 1} x^{T} (y - x θ)

$\psi_{\theta}(x,y)=\left(\dfrac{1}{N}\sum_i X_i^TX_i\right)^{-1}x^T(y-x\theta)$

In general I find this framework (working with influence functions as Gateaux derivatives) easier to deal with.

— jayk
quelle