Der Titel "Fehler in Variablen" und der Inhalt der Frage scheinen unterschiedlich zu sein, da er fragt, warum wir die Variation in X nicht berücksichtigenX bei der Modellierung der bedingten Antwort, dh der Inferenz für Regressionsparameter, nicht berücksichtigen. Diese beiden Themen scheinen mir orthogonal zu sein, deshalb antworte ich hier auf den Inhalt.
Ich habe zuvor eine ähnliche Frage beantwortet: Was ist der Unterschied zwischen der Konditionierung von Regressoren und der Behandlung als feststehend? , also werde ich hier einen Teil meiner Antwort dort kopieren:
Ich werde versuchen, ein Argument für die Konditionierung von Regressoren etwas formeller zu formulieren. Sei (Y,X) ein Zufallsvektor, und das Interesse gilt der Regression Y auf X , wobei unter Regression die bedingte Erwartung von Y auf X zu verstehen ist . Unter multinormalen Annahmen ist dies eine lineare Funktion, aber unsere Argumente hängen nicht davon ab. Wir beginnen mit der üblichen Berücksichtigung der Gelenkdichte
f(y,x)=f(y∣x)f(x)
f(y,x;θ,ψ)=fθ(y∣x)fψ(x)
θψXθ=(β,σ2)(θ,ψ)Θ×Ψ, a Cartesian product, and the two parameters have no part in common.
This can be interpreted as a factorization of the statistical experiment, (or of the data generation process, DGP), first X is generated according to fψ(x), and as a second step, Y is generated according to the conditional density fθ(y∣X=x). Note that the first step does not use any knowledge about θ, that enters only in the second step. The statistic X is ancillary for θ, see https://en.wikipedia.org/wiki/Ancillary_statistic.
But, depending on the results of the first step, the second step could be more or less informative about θ. If the distribution given by fψ(x) have very low variance, say, the observed x's will be concentrated in a small region, so it will be more difficult to estimate θ. So, the first part of this two-step experiment determines the precision with which θ can be estimated. Therefore it is natural to condition on X=x in inference about the regression parameters. That is the conditionality argument, and the outline above makes clear its assumptions.
In designed experiments its assumption will mostly hold, often with observational data not. Some examples of problems will be: regression with lagged responses as predictors. Conditioning on the predictors in this case will also condition on the response! (I will add more examples).
One book which discusses this problems in a lot of detail is Information and exponential families: In statistical theory by O. E Barndorff-Nielsen. See especially chapter 4. The author says the separation logic in this situation is however seldom explicated but gives the following references: R A Fisher (1956) Statistical Methods and Scientific Inference §4.3 and Sverdrup (1966) The present state of the decision theory and the Neyman-Pearson theory.
The factorization used here is somewhat similar in spirit to the factorization theorem of sufficient statistics. If focus is on the regression parameters θ, and the distribution of X do not depend on θ, then how could the distribution of (or variation in) X contain information about θ?
This separation argument is helpful also because it points to the cases where it cannot be used, for instance regression with lagged responses as predictors.