Das Folgende beantwortet die Frage nicht genau. Es kann Ihnen jedoch einige Ideen geben. Dies habe ich kürzlich getan, um die Anpassung mehrerer Regressionsmodelle mithilfe von ein bis vier unabhängigen Variablen zu bewerten (die abhängige Variable befand sich in der ersten Spalte des df1-Datenrahmens).
# create the combinations of the 4 independent variables
library(foreach)
xcomb <- foreach(i=1:4, .combine=c) %do% {combn(names(df1)[-1], i, simplify=FALSE) }
# create formulas
formlist <- lapply(xcomb, function(l) formula(paste(names(df1)[1], paste(l, collapse="+"), sep="~")))
Der Inhalt von as.character (Formularliste) war
[1] "price ~ sqft" "price ~ age"
[3] "price ~ feats" "price ~ tax"
[5] "price ~ sqft + age" "price ~ sqft + feats"
[7] "price ~ sqft + tax" "price ~ age + feats"
[9] "price ~ age + tax" "price ~ feats + tax"
[11] "price ~ sqft + age + feats" "price ~ sqft + age + tax"
[13] "price ~ sqft + feats + tax" "price ~ age + feats + tax"
[15] "price ~ sqft + age + feats + tax"
Dann habe ich einige nützliche Indizes gesammelt
# R squared
models.r.sq <- sapply(formlist, function(i) summary(lm(i))$r.squared)
# adjusted R squared
models.adj.r.sq <- sapply(formlist, function(i) summary(lm(i))$adj.r.squared)
# MSEp
models.MSEp <- sapply(formlist, function(i) anova(lm(i))['Mean Sq']['Residuals',])
# Full model MSE
MSE <- anova(lm(formlist[[length(formlist)]]))['Mean Sq']['Residuals',]
# Mallow's Cp
models.Cp <- sapply(formlist, function(i) {
SSEp <- anova(lm(i))['Sum Sq']['Residuals',]
mod.mat <- model.matrix(lm(i))
n <- dim(mod.mat)[1]
p <- dim(mod.mat)[2]
c(p,SSEp / MSE - (n - 2*p))
})
df.model.eval <- data.frame(model=as.character(formlist), p=models.Cp[1,],
r.sq=models.r.sq, adj.r.sq=models.adj.r.sq, MSEp=models.MSEp, Cp=models.Cp[2,])
Der endgültige Datenrahmen war
model p r.sq adj.r.sq MSEp Cp
1 price~sqft 2 0.71390776 0.71139818 42044.46 49.260620
2 price~age 2 0.02847477 0.01352823 162541.84 292.462049
3 price~feats 2 0.17858447 0.17137907 120716.21 351.004441
4 price~tax 2 0.76641940 0.76417343 35035.94 20.591913
5 price~sqft+age 3 0.80348960 0.79734865 33391.05 10.899307
6 price~sqft+feats 3 0.72245824 0.71754599 41148.82 46.441002
7 price~sqft+tax 3 0.79837622 0.79446120 30536.19 5.819766
8 price~age+feats 3 0.16146638 0.13526220 142483.62 245.803026
9 price~age+tax 3 0.77886989 0.77173666 37884.71 20.026075
10 price~feats+tax 3 0.76941242 0.76493500 34922.80 21.021060
11 price~sqft+age+feats 4 0.80454221 0.79523470 33739.36 12.514175
12 price~sqft+age+tax 4 0.82977846 0.82140691 29640.97 3.832692
13 price~sqft+feats+tax 4 0.80068220 0.79481991 30482.90 6.609502
14 price~age+feats+tax 4 0.79186713 0.78163109 36242.54 17.381201
15 price~sqft+age+feats+tax 5 0.83210849 0.82091573 29722.50 5.000000
Schließlich ein Cp-Plot (mit Bibliothek wle)