[R] How to calculate vif of each term of model in R?

PIKAL Petr petr.pikal at precheza.cz
Mon Apr 27 10:15:39 CEST 2015


Hi

Better to use dput(xsys) output as it preserves your actual data.

Other answers see in line

> -----Original Message-----
> From: Methekar, Pushpa (GE Transportation, Non-GE)
> [mailto:pushpa.methekar at ge.com]
> Sent: Monday, April 27, 2015 8:37 AM
> To: PIKAL Petr
> Subject: RE: How to calculate vif of each term of model in R?
>
> Hi petr,
> Thanks for your help ,I really appreciate it.
> My data is confidential so I am attaching sample data.
> Hope it would be useful for u to solve my problem.
>
> I ll tell u in brief what are my tasks.......
>
> ###### read in data
> xsys=read.csv(file.choose(),header = T)
>
> ####   x and y elements
> y1=xsys$Pre.Turb.Temp.L


<snip>


> y9=xsys$Emiss..1..EPA.MAF..Dry.
> ####
> x1=xsys$Engine.Speed


<snip>


> x9=xsys$NG.LHV
>
>
> ##### making models
> model1<-lm(y1~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)

your variables x and y are defined outside your original data frame hence no need for data argument. If you renamed your variables inside your data frame instead, you could use their position in creating models.

vif(model1)
Error in vif.default(model1) :
  there are aliased coefficients in the model

probably due to fewer data than terms.

I would use list for creating models. There is plenty of examples how to use lists.

Something like (untested)

for (i in 1:9) {
models[[i]] <- lm(xsys[,i]~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
}

which can be somehow polished by as.formula together with paste construction (see ?formula examples)

and after that something like this may work.

final.model <- vector(mode="list", 9)
for(i in 1:9) {
v <- vif(models[[i]])
while(v>=10) {
wmvif <- which.max(v)
models[[i]] <- update(models[[i]], as.formula(paste(". ~ . -",names(v)[wmvif]))
v <- vif(models[[i]])
}
final.model[[i]] <-models[[i]]
}

further operations like saving and plotting are easily done with lists and/or their manipulation.

Cheers
Petr



> model2<-lm(y2~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model3<-lm(y3~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model4<-lm(y4~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model5<-lm(y5~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model6<-lm(y6~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model7<-lm(y7~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model8<-lm(y8~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
> model9<-lm(y9~x1+x2+x3+x4+x5+x6+x7+x8+x9,data=xsys)
>
>
> ###making secondary models
> secmodel1<-lm(y1
> ~poly(x1,2)+poly(x2,2)+poly(x3,2)+poly(x4,2)+poly(x5,2)+poly(x6,2)
>               +poly(x7,2)+poly(x8,2)+poly(x9,2)
>               +x1*(x2+x3+x4+x5+x6+x7+x8+x9)
>               +x2*(x3+x4+x5+x6+x7+x8+x9)
>               +x3*(x4+x5+x6+x7+x8+x9)
>               +x4*(x5+x6+x7+x8+x9)
>               +x5*(x6+x7+x8+x9)
>               +x6*(x7+x8+x9)
>               +x7*(x8+x9)
>               +x8*(x9),data=xsys)
>
>                ,data=xsys)
> secmodel2<-lm(y2
> ~poly(x1,2)+poly(x2,2)+poly(x3,2)+poly(x4,2)+poly(x5,2)+poly(x6,2)
>               +poly(x7,2)+poly(x8,2)+poly(x9,2)
>               +x1*(x2+x3+x4+x5+x6+x7+x8+x9)
>               +x2*(x3+x4+x5+x6+x7+x8+x9)
>               +x3*(x4+x5+x6+x7+x8+x9)
>               +x4*(x5+x6+x7+x8+x9)
>               +x5*(x6+x7+x8+x9)
>               +x6*(x7+x8+x9)
>               +x7*(x8+x9)
>               +x8*(x9),data=xsys)
> ...........................up to
> Secmodel9<-lm(y9
> ~poly(x1,2)+poly(x2,2)+poly(x3,2)+poly(x4,2)+poly(x5,2)+poly(x6,2)
>               +poly(x7,2)+poly(x8,2)+poly(x9,2)
>               +x1*(x2+x3+x4+x5+x6+x7+x8+x9)
>               +x2*(x3+x4+x5+x6+x7+x8+x9)
>               +x3*(x4+x5+x6+x7+x8+x9)
>               +x4*(x5+x6+x7+x8+x9)
>               +x5*(x6+x7+x8+x9)
>               +x6*(x7+x8+x9)
>               +x7*(x8+x9)
>               +x8*(x9),data=xsys)
>
>
>
> ### now find out vif of each term until vif(term)<=10 and remove
> maximum vif term .
>
>
> My problem-
> I want to take each model in loop so that I can find out one maximum
> vif term in all terms .and remove it .again I ll find out, again remove
> it till vif(term)<=10 and stop.
>
>
> What u said in last mail is working fine.................for one time .
> I have to check it every time...
> Also I want to save each maximum vif in array  and plot() it.
> Hope you will understand my problem.
>
>
>
> Thanks,
> Pushpa
>
>
> -----Original Message-----
> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> Sent: Thursday, April 23, 2015 7:15 PM
> To: Methekar, Pushpa (GE Transportation, Non-GE)
> Cc: r-help at r-project.org
> Subject: RE: How to calculate vif of each term of model in R?
>
> Well. Your function results in error.
>
> > f1<-function(model){
> + vfs<<-vif(model)
> + vfs
> + ex<<-subset(vfs,vfs>=10)
> + print(ex)
> + maxx<<-which.max(ex)
> + print(maxx)
> + mm<<-vector(mode = "numeric",length = 50)
> +
> + mm<<-maxx
> + maxindex<<-which.max(ex)
> + print(maxindex)
> +
> + }
> > F1(model1)
> Error: could not find function "F1"
> >
> > f1(model1)
> Error in vcov(fit, regcoef.only = TRUE) : object 'model1' not found
> >
>
> Beside, why do you use global assignment.
>
> <<-
>
> within your function? I use R for more than 10 years and do not
> remember that I needed to use it.
>
> You did not provide any data nor try to work on my suggestions. If you
> kept your mail copies to list you probably would get the answer more
> quickly.
>
> You can use as.formula construction to programmatically select terms
> for update.
>
> > fit
>
> Call:
> lm(formula = barviv ~ rutil + sio2 + teklat + fe, data = prov)
>
> Coefficients:
> (Intercept)        rutil         sio2       teklat           fe
>    1230.154        5.956       15.123       55.322     5571.923
>
> > vif(fit)
>    rutil     sio2   teklat       fe
> 1.249975 1.702475 1.504633 1.094505
> > which.max(vif(fit))
> sio2
>    2
> > wmvif <- which.max(vif(fit))
> > update(fit, as.formula(paste(". ~ . -",names(vif(fit)[wmvif]))))
>
> Call:
> lm(formula = barviv ~ rutil + teklat + fe, data = prov)
>
> Coefficients:
> (Intercept)        rutil       teklat           fe
>     1220.25         6.04        97.44      5802.31
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: Methekar, Pushpa (GE Transportation, Non-GE)
> > [mailto:pushpa.methekar at ge.com]
> > Sent: Friday, April 17, 2015 2:47 PM
> > To: PIKAL Petr
> > Subject: RE: How to calculate vif of each term of model in R?
> >
> > Hey ,
> > .........- names(vif(model1))[vmax])
> > Is not working actually
> > Instead
> > Update(model1,.~.-x8) is working fine.
> >
> >
> >
> > For that any option would you tell me.
> > As per your concern
> > Update(model1,.~.-names(vif(model1))[vmax] means
> >
> > Update(model1,.~.-"x8")
> > But it's not going to work.
> >
> > Look here is my program
> >
> > > require(rms)
> > f1<-function(model){
> > vfs<<-vif(model)
> > vfs
> > ex<<-subset(vfs,vfs>=10)
> > print(ex)
> > maxx<<-which.max(ex)
> > print(maxx)
> > mm<<-vector(mode = "numeric",length = 50)
> >
> > mm<<-maxx
> > maxindex<<-which.max(ex)
> > print(maxindex)
> >
> > }
> > F1(model1)
> >
> >
> >
> > Output:
> >    f1(model1)
> >        x7        x8        x9
> >  13.87063 220.96963 214.03413
> > [1] 220.9696
> > x8
> >  2
> >
> >
> > Now I have to do outside function explicitly
> > model1<-update(model1,.~.-x8)
> > then only I can remove my variable x8
> >
> > but I want to do in inside function so that automatically maximum
> > element get eliminated .
> >
> >
> > Thanks,
> > Pushpa
> >
> >
> >
> > -----Original Message-----
> > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > Sent: Friday, April 17, 2015 5:25 PM
> > To: Methekar, Pushpa (GE Transportation, Non-GE)
> > Cc: r-help at r-project.org
> > Subject: RE: How to calculate vif of each term of model in R?
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: Methekar, Pushpa (GE Transportation, Non-GE)
> > > [mailto:pushpa.methekar at ge.com]
> > > Sent: Friday, April 17, 2015 1:12 PM
> > > To: PIKAL Petr
> > > Subject: RE: How to calculate vif of each term of model in R?
> > >
> > > Hi Petr,
> > > You got my problem ,the solution which u specified is little good
> > > but with
> > >
> > >
> > > >update(model1,.~. - names(vif(model1))[vmax])
> > >
> > >
> > >
> > > I won't be able to update my model .
> > > Instead I have to write like
> > >
> > >
> > > > update(model1,.~. - x8)
> > >
> >
> > maybe something like
> >
> > fun = function(model1) {
> >
> > vmax <- which.max(vif(model1))
> >
> > while( vif(model1)[vmax]>=10) {
> >
> > model1 <- update(model1,.~. - names(vif(model1))[vmax])) vmax <-
> > which.max(vif(model1))
> >
> > }
> >
> > return(model1)
> >
> > }
> >
> > I am not sure about cycle (i did not use while in R for a while).
> >
> > Without some data I cannot check syntax so it is up to you.
> >
> > Cheers
> > Petr
> >
> >
> > >
> > > i.e my x which having highest vif value.
> > > Each time for removing highest x .i  have to explicitly write
> > > ......-
> > > x8) So is there any way to avoid this?
> > >
> > >
> > >
> > >
> > > Thanks,
> > > Pushpa
> > >
> > > -----Original Message-----
> > > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > > Sent: Friday, April 17, 2015 4:16 PM
> > > To: Methekar, Pushpa (GE Transportation, Non-GE)
> > > Subject: RE: How to calculate vif of each term of model in R?
> > >
> > > Comments to your first mail which are among lines and directly
> > related
> > > to your code.
> > >
> > > I specifically mentioned it
> > >
> > > > answers and comments in line
> > >
> > > If you have my first respond you can find them easier then now as
> > they
> > > are buried within your mail.
> > >
> > > Cheers
> > > Petr
> > >
> > >
> > > > -----Original Message-----
> > > > From: Methekar, Pushpa (GE Transportation, Non-GE)
> > > > [mailto:pushpa.methekar at ge.com]
> > > > Sent: Friday, April 17, 2015 12:10 PM
> > > > To: PIKAL Petr
> > > > Subject: RE: How to calculate vif of each term of model in R?
> > > >
> > > > What comments are you talking about?
> > > >
> > > > -----Original Message-----
> > > > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > > > Sent: Friday, April 17, 2015 3:38 PM
> > > > To: Methekar, Pushpa (GE Transportation, Non-GE)
> > > > Cc: r-help at r-project.org
> > > > Subject: RE: How to calculate vif of each term of model in R?
> > > >
> > > > Did you follow my advice/comments?
> > > >
> > > > Cheers
> > > > Petr
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Methekar, Pushpa (GE Transportation, Non-GE)
> > > > > [mailto:pushpa.methekar at ge.com]
> > > > > Sent: Friday, April 17, 2015 11:43 AM
> > > > > To: PIKAL Petr
> > > > > Subject: RE: How to calculate vif of each term of model in R?
> > > > >
> > > > > Car package
> > > > >
> > > > > -----Original Message-----
> > > > > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > > > > Sent: Friday, April 17, 2015 3:11 PM
> > > > > To: Methekar, Pushpa (GE Transportation, Non-GE); r-help at r-
> > > > project.org
> > > > > Subject: RE: How to calculate vif of each term of model in R?
> > > > >
> > > > > Hi
> > > > >
> > > > > I did not see any answer so I try.
> > > > >
> > > > > Your question lacks some info:
> > > > >
> > > > > Which vif - car or HH?
> > > > >
> > > > > answers and comments in line
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf
> > > > > > Of Methekar, Pushpa (GE Transportation, Non-GE)
> > > > > > Sent: Wednesday, April 08, 2015 10:24 AM
> > > > > > To: r-help at r-project.org
> > > > > > Subject: [R] How to calculate vif of each term of model in R?
> > > > > >
> > > > > >
> > > > > > I am beginner in R doing modelling in R, I loaded excel sheet
> > in
> > > > > > R, i have chosen x elements and y elements then fitted model
> > for
> > > > linear
> > > > > and
> > > > > > second order regression. Now I have both models. I am bit
> > > confused
> > > > > how
> > > > > > to calculate vif for each term in model like
> > > > > >
> > > > > > e.g model1<-lm(y1~x1+x2+.....x9) when I am using rms package
> > > > > > then
> > > > > it's
> > > > > > giving me like
> > > > > >
> > > > > >     vif(model1)
> > > > > >
> > > > > >        x1         x2         x3         x4         x5
> > x6
> > > > > > x7
> > > > > >
> > > > > >  6.679692   1.520271   1.667125   3.618439   4.931810
> > 2.073879
> > > > > > 13.870630
> > > > > >
> > > > > >         x8         x9
> > > > > >
> > > > > >    220.969628 214.034135
> > > > > >
> > > > > > now i want to compare each term with std vif as vif>=10 and
> > > > > > which
> > > > > will
> > > > > > satisfy this condition i want to delete that term and update
> > > > model1.
> > > > > i
> > > > > > have done something like this
> > > > > >
> > > > > > fun = function(model1) {
> > > > > >
> > > > > >  for(i in 1:length(model1))    {
> > > > > >
> > > > > >       v=vif(model1)
> > > > > >
> > > > > >          ss=any(v[i]>=10)
> > > > >
> > > > > here you select only one item from vif, Why do you use any?
> > > > >
> > > > > >
> > > > > >                 if(ss==1){update(model1,.~.,-v[i])}
> > > > > >
> > > > > >                 else{print("no update")}
> > > > > >
> > > > >
> > > > > Why do you change i here?
> > > > >
> > > > > >                  i<-i+1
> > > > > >
> > > > > >     }
> > > > > >
> > > > > >
> > > > > >
> > > > > >         return(model1)
> > > > > >
> > > > > >       }
> > > > > >
> > > > >
> > > > > if you want to get rid of all terms bigger than some threshold
> > > > > in
> > > > once
> > > > > you can use
> > > > >
> > > > > sel <- which(vif(model1)>10)
> > > > >
> > > > > and select values for update possibly by
> > > > >
> > > > > update(model1,.~. - names(vif(model1))[sel])
> > > > >
> > > > > or if you want to get rid one by one you can use
> > > > >
> > > > > vmax <- which.max(vif(model1))
> > > > > and check if max vif value is bigger than 10.
> > > > >
> > > > > vif(model1)[vmax]>=10
> > > > >
> > > > > If it is just update with
> > > > >
> > > > > - names(vif(model1))[vmax])
> > > > >
> > > > > if it is not do not update.
> > > > >
> > > > > All of this untested.
> > > > >
> > > > > Cheers
> > > > > Petr
> > > > >
> > > > > > fun(model1)
> > > > > >
> > > > > > but giving error as
> > > > > >
> > > > > > Error in if (ss == 1) { : missing value where TRUE/FALSE
> > needed.
> > > > > >
> > > > > > please tell me how do i solve this problem.
> > > > > >
> > > > > >
> > > > > >

________________________________
Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.


More information about the R-help mailing list