[R] variable selection in linear regression

Syaiba Balqish syaibabalqish at gmail.com
Tue Jun 7 09:13:19 CEST 2011


Hello  
  
With due respect, have a nice time. I would like to ask some command in R.  
It is regarding variable selection in linear regression. 
In R, there is one rebuild function called "step" which
selecting variables according to AIC.

let say i have data [y, x1,x2,x3,x4]
we start with y~b0
i compute the partial F test and choose the variable 
with maximum partial F to enter the model, let say
x4 with max value of partial F=58.02377.
therefore, our next model is y~b0+b4x4

my questions...

1.how should i write so that x4 will be added to the next step?
2. the formula for partial F test is   
F*=(SSE(reduced  model)-SSE(full model)/dfR-dfF) / (SSE(full model)/dfF) 
which can be simply as
F*=MSR(xi | x1,x2,...,xi-1,xi+1) / MSE(x1,x2,...,xi-1,xi,xi+1)
If i would like to write my formula by simplified one, how can i write it 
for every xi (not in the model) that need to be selected with conditionally
depend on other x's (in the model)
 let say , i want to select other variables (x1, x2, x3) after x4 is
selected
F*=MSR(x3|x4)/MSE(x3,x4)

Below, i attach my simple code

p <- dim(mydata)[2]
d <- p-1
n <- dim(mydata)[1]
x <- as.matrix(mydata[,2:p]) 
y <- as.matrix(mydata[,1])
X <- as.matrix(rep(1,n)) 


b <- lm(y~1,data=mydata)$coefficients 
yhat <- X%*%b 
res <- y-yhat
sigma.hat <- sqrt(sum(res^2)/(n-ncol(X)))
cv <- sigma.hat^2*ginv(t(X)%*%X)
se <- sqrt(diag(cv)) 

pc <- matrix(0,nrow=1,ncol=d)
resF <- matrix(0, nrow=n, ncol=d)
pf <- matrix(0, nrow=1, ncol=d)
for(j in 1:d){
pc[,j] <- cor(x=(x[,j]), y=(mydata[,1]))
resF[,j] <- lsfit(x[,j], y)$residuals  
sseF <- t(as.matrix(apply(resF^2, 2, sum)))
resR <- lm(y~1,data=mydata)$residuals
sseR <- sum(resR^2)
dfF <- n-2 
dfR <- n-1 
pf[,j] <- ((sseR-sseF[,j])/(dfR-dfF))/(sseF[,j]/dfF)  
max.pf=max(pf)
max.pc=max(pc) 


Thank you and looking forward to hear some replies.
Sincerely,

Iba
Universiti Putra Malaysia
 


--
View this message in context: http://r.789695.n4.nabble.com/variable-selection-in-linear-regression-tp3578795p3578795.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list