[R] random effects model

Tue Jan 8 00:40:01 CET 2013

HI,
BP.stack5 is the one without missing values.
na.omit(....).  Otherwise, I have to use the option na.action=.. in the ?geese() statement

You need to read about the correlation structures.  IN unstructured option, more number of parameters needs to be estimated,  In repeated measures design, when the underlying structure is not known, it would be better to compare using different options (exchangeable is similar to compound symmetry) and select the one which provide the least value for AIC or BIC.  Have a look at 

http://stats.stackexchange.com/questions/21771/how-to-perform-model-selection-in-gee-in-r
It's not clear to me  "reference to write about missing values".    
A.K.

----- Original Message -----
From: Usha Gurunathan <usha.nathan at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Monday, January 7, 2013 6:12 PM
Subject: Re: [R] random effects model

Hi AK

2)I shall try putting exch. and check when I get home. Btw, what is
BP.stack5? is it with missing values or only complete cases?

I guess I am still not clear about the unstructured and exchangeable
options, as in which one is better.

1)Rgding the summary(p): NA thing, I tried putting one of my gee equation.

Can you suggest me a reference to write about" missing values and the
implications for my results"

Thanks.

On 1/8/13, arun <smartpink111 at yahoo.com> wrote:
> HI,
>
> Just to add:
> fit3<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack5,family=binomial,corstr="exch",scale.fix=TRUE)
> #works
>  summary(fit3)$mean["p"]
> #                             p
> #(Intercept)         0.00000000
> #MaternalAge4        0.49099242
> #MaternalAge5        0.04686295
> #time21              0.86164351
> #MaternalAge4:time21 0.59258221
> #MaternalAge5:time21 0.79909832
>
> fit4<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack5,family=binomial,corstr="unstructured",scale.fix=TRUE)
> #when the correlation structure is changed to "unstructured"
> #Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>  # contrasts can be applied only to factors with 2 or more levels
> #In addition: Warning message:
> #In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'
>
>
> Though, it works with data(Ohio)
>
> fit1<-geese(resp~age+smoke+age:smoke,id=id,data=ohio1,family=binomial,corstr="unstructured",scale.fix=TRUE)
>  summary(fit1)$mean["p"]
> #                      p
> #(Intercept)  0.00000000
> #age-1        0.60555454
> #age0         0.45322698
> #age1         0.01187725
> #smoke1       0.86262269
> #age-1:smoke1 0.17239050
> #age0:smoke1  0.32223942
> #age1:smoke1  0.36686706
>
>
>
> By checking:
>  with(BP.stack5,table(MaternalAge,time))
> #           time
> #MaternalAge   14   21
>   #        3 1104  864
>    #       4  875  667
>     #     5   67   53 #less number of observations
>
>
>  BP.stack6 <- BP.stack5[order(BP.stack5$CODEA, BP.stack5$time),]
>  head(BP.stack6)  # very few IDs with  MaternalAge==5
> #       X CODEA Sex MaternalAge Education Birthplace AggScore IntScore
> #1493 3.1     3   2           3         3          1        0        0
> #3202 3.2     3   2           3         3          1        0        0
> #1306 7.1     7   2           4         6          1        0        0
> #3064 7.2     7   2           4         6          1        0        0
> #1    8.1     8   2           4         4          1        0        0
> #2047 8.2     8   2           4         4          1        0        0
>  #         Categ time Obese Overweight hibp
> #1493 Overweight   14     0          0    0
> #3202 Overweight   21     0          1    0
> #1306      Obese   14     0          0    0
> #3064      Obese   21     1          1    0
> #1        Normal   14     0          0    0
> #2047     Normal   21     0          0    0
> BP.stack7<-BP.stack6[BP.stack6$MaternalAge!=5,]
>  BP.stack7$MaternalAge<-factor(as.numeric(as.character(BP.stack7$MaternalAge)
>  fit5<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack7,family=binomial,corstr="unstructured",scale.fix=TRUE)
> #Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>  # contrasts can be applied only to factors with 2 or more levels
>
>  with(BP.stack7,table(MaternalAge,time))  #It looks like the combinations
> are still there
> #           time
> #MaternalAge   14   21
>  #         3 1104  864
>    #       4  875  667
>
> It works also with corstr="ar1".   Why do you gave the option
> "unstructured"?
> A.K.
>
>
>
>
>
>
> ----- Original Message -----
> From: rex2013 <usha.nathan at gmail.com>
> To: r-help at r-project.org
> Cc:
> Sent: Monday, January 7, 2013 6:15 AM
> Subject: Re: [R] random effects model
>
> Hi A.K
>
> Below is the comment I get, not sure why.
>
> BP.sub3 is the stacked data without the missing values.
>
> BP.geese3 <- geese(HiBP~time*MaternalAge,data=BP.sub3,id=CODEA,
> family=binomial, corstr="unstructured", na.action=na.omit)Error in
> `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>   contrasts can be applied only to factors with 2 or more levels
>
> Even though age has 3 levels; time has 14 years & 21 years; HIBP is a
> binary response outcome.
>
> 2) When you mentioned summary(m1)$mean["p"] what did the p mean? i
> used this in one of the gee command, it produced NA as answer?
>
> Many thanks
>
>
>
> On Mon, Jan 7, 2013 at 5:26 AM, arun kirshna [via R] <
> ml-node+s789695n4654795h72 at n4.nabble.com> wrote:
>
>> Hi,
>>
>> I am  not very familiar with the geese/geeglm().  Is it from
>> library(geepack)?
>> Regarding your question:
>> "
>> Can you tell me if I can use the geese or geeglm function with this data
>> eg: : HIBP~ time* Age
>> Here age is a factor with 3 levels, time: 2 levels, HIBP = yes/no.
>>
>> From your original data:
>> BP_2b<-read.csv("BP_2b.csv",sep="\t")
>> head(BP_2b,2)
>> #  CODEA Sex MaternalAge Education Birthplace AggScore IntScore Obese14
>> #1     1  NA           3         4          1       NA       NA      NA
>> #2     3   2           3         3          1        0        0       0
>>  # Overweight14 Overweight21 Obese21 hibp14 hibp21
>> #1           NA           NA      NA     NA     NA
>> #2            0            1       0      0      0
>>
>> If I understand your new classification:
>> BP.stacknormal<- subset(BP_2b,Obese14==0 & Overweight14==0 & Obese21==0 &
>> Overweight21==0)
>> BP.stackObese <- subset(BP_2b,(Obese14==1& Overweight14==0 &
>> Obese14==1&Overweight14==1)|(Obese14==1&Overweight14==1 & Obese21==1 &
>> Overweight21==0)|(Obese14==1&Overweight14==0 & Obese21==0 &
>> Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==1 &
>> Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==1 &
>> Overweight21==1)|(Obese14==0 & Overweight14==1 & Obese21==1
>> &Overweight21==1)|(Obese14==1& Overweight14==1 & Obese21==1&
>> Overweight21==1)) #check whether there are more classification that fits
>> to
>> #Obese
>>  BP.stackOverweight <- subset(BP_2b,(Obese14==0 & Overweight14==1 &
>> Obese21==0 & Overweight21==1)|(Obese14==0 &Overweight14==1 & Obese21==0 &
>> Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==0 &
>> Overweight21==1))
>> BP.stacknormal$Categ<-"Normal"
>> BP.stackObese$Categ<-"Obese"
>> BP.stackOverweight$Categ <- "Overweight"
>>
>> BP.newObeseOverweightNormal<-na.omit(rbind(BP.stacknormal,BP.stackObese,BP.stackOverweight))
>>
>>  nrow(BP.newObeseOverweightNormal)
>> #[1] 1581
>> BP.stack3 <-
>> reshape(BP.newObeseOverweightNormal,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21"),c("hibp14","hibp21")),v.names=c("Obese","Overweight","hibp"),direction="long")
>>
>> library(car)
>> BP.stack3$time<-recode(BP.stack3$time,"1=14;2=21")
>> head(BP.stack3,2)
>>   #  CODEA Sex MaternalAge Education Birthplace AggScore IntScore  Categ
>> time
>> #8.1     8   2           4         4          1        0        0 Normal
>> 14
>> #9.1     9   1           3         6          2        0        0 Normal
>> 14
>>   #  Obese Overweight hibp
>> #8.1     0          0    0
>>
>> Now, your formula: (HIBP~time*Age), is it MaternalAge?
>> If it is, it has three values
>> unique(BP.stack3$MaternalAge)
>> #[1] 4 3 5
>> and for time (14,21) # If it says that geese/geeglm, contrasts could be
>> applied with factors>=2 levels, what is the problem?
>> If you take "Categ" variable, it also has 3 levels (Normal, Obese,
>> Overweight).
>>
>>  BP.stack3$MaternalAge<-factor(BP.stack3$MaternalAge)
>>  BP.stack3$time<-factor(BP.stack3$time)
>>
>> library(geepack)
>> For your last question about how to get the p-values:
>> # Using one of the example datasets:
>> data(seizure)
>>      seiz.l <- reshape(seizure,
>>                        varying=list(c("base","y1", "y2", "y3", "y4")),
>>                        v.names="y", times=0:4, direction="long")
>>      seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
>>      seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
>>      seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
>>      m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
>>                  data=seiz.l, corstr="exch", family=poisson)
>>      summary(m1)
>>
>>  summary(m1)$mean["p"]
>> #                    p
>> #(Intercept) 0.0000000
>> #x           0.3347040
>> #trt         0.9011982
>> #x:trt       0.6236769
>>
>>
>> #If you need the p-values of the scale
>>    summary(m1)$scale["p"]
>>  #                   p
>> #(Intercept) 0.0254634
>>
>> Hope it helps.
>>
>> A.K.
>>
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: rex2013 <[hidden
>> email]<http://user/SendEmail.jtp?type=node&node=4654795&i=0>>
>>
>> To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=1>
>> Cc:
>> Sent: Sunday, January 6, 2013 4:55 AM
>> Subject: Re: [R] random effects model
>>
>> Hi A.K
>>
>> Regarding my question on comparing normal/ obese/overweight with blood
>> pressure change, I did finally as per the first suggestion of stacking the
>> data and creating a normal category . This only gives me a obese not obese
>> 14, but when I did with the wide format hoping to  get  a
>> obese14,normal14,overweight 14 Vs hibp 21, i could not complete any of the
>> models.
>> This time I classified obese=1 & overweight=1 as obese itself.
>>
>> Can you tell me if I can use the geese or geeglm function with this data
>> eg: : HIBP~ time* Age
>> Here age is a factor with 3 levels, time: 2 levels, HIBP = yes/no.
>>
>> It says geese/geeglm: contrast can be applied only with factor with 2 or
>> more levels. What is the way to overcome this. Can I manipulate the data
>> to
>> make it work.
>>
>> I need to know if the demogrphic variables affect change in blood pressure
>> status over time?
>>
>> How to get the p values with gee model?
>>
>> Thanks
>> On Thu, Jan 3, 2013 at 5:06 AM, arun kirshna [via R] <
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=2>>
>> wrote:
>>
>> > HI Rex,
>> > If I take a small subset from your whole dataset, and go through your
>> > codes:
>> > BP_2b<-read.csv("BP_2b.csv",sep="\t")
>> >  BP.sub<-BP_2b[410:418,c(1,8:11,13)] #deleted the columns that are not
>> > needed
>> >  BP.stacknormal<- subset(BP.subnew,Obese14==0 & Overweight14==0)
>> > BP.stackObese <- subset(BP.subnew,Obese14==1)
>> >  BP.stackOverweight <- subset(BP.subnew,Overweight14==1)
>> > BP.stacknormal$Categ<-"Normal14"
>> > BP.stackObese$Categ<-"Obese14"
>> > BP.stackOverweight$Categ <- "Overweight14"
>> >
>> BP.newObeseOverweightNormal<-rbind(BP.stacknormal,BP.stackObese,BP.stackOverweight)
>>
>> >
>> >  BP.newObeseOverweightNormal
>> > #    CODEA Obese14 Overweight14 Overweight21 Obese21 hibp21        Categ
>> > #411   541       0            0            0       0      0     Normal14
>> > #415   545       0            0            1       1      1     Normal14
>> > #418   549       0            0            1       0      0     Normal14
>> > #413   543       1            0            1       1      0      Obese14
>> > #417   548       0            1            1       0      0 Overweight14
>> > BP.newObeseOverweightNormal$Categ<-
>> > factor(BP.newObeseOverweightNormal$Categ)
>> > BP.stack3 <-
>> >
>> reshape(BP.newObeseOverweightNormal,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),direction="long")
>>
>> >
>> > library(car)
>> > BP.stack3$time<-recode(BP.stack3$time,"1=14;2=21")
>> > BP.stack3 #Here Normal14 gets repeated even at time==21.  Given that you
>> > are using the "Categ" and "time" #columns in the analysis, it will give
>> > incorrect results.
>> > #      CODEA hibp21        Categ time Obese Overweight
>> > #541.1   541      0     Normal14   14     0          0
>> > #545.1   545      1     Normal14   14     0          0
>> > #549.1   549      0     Normal14   14     0          0
>> > #543.1   543      0      Obese14   14     1          0
>> > #548.1   548      0 Overweight14   14     0          1
>> > #541.2   541      0     Normal14   21     0          0
>> > #545.2   545      1     Normal14   21     1          1
>> > #549.2   549      0     Normal14   21     0          1
>> > #543.2   543      0      Obese14   21     1          1
>> > #548.2   548      0 Overweight14   21     0          1
>> > #Even if I correct the above codes, this will give incorrect
>> > results/(error as you shown) because the response variable (hibp21) gets
>> > #repeated when you reshape it from wide to long.
>> >
>> > The correct classification might be:
>> > BP_2b<-read.csv("BP_2b.csv",sep="\t")
>> >  BP.sub<-BP_2b[410:418,c(1,8:11,13)]
>> >
>> BP.subnew<-reshape(BP.sub,idvar="CODEA",timevar="time",sep="",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),direction="long")
>>
>> >
>> > BP.subnew$time<-recode(BP.subnew$time,"1=14;2=21")
>> >  BP.subnew<-na.omit(BP.subnew)
>> >
>> > BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==14 &
>> > BP.subnew$Obese==0]<-"Overweight14"
>> > BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==21 &
>> > BP.subnew$Obese==0]<-"Overweight21"
>> >  BP.subnew$Categ[BP.subnew$Obese==1 & BP.subnew$time==14 &
>> > BP.subnew$Overweight==0]<-"Obese14"
>> >  BP.subnew$Categ[BP.subnew$Obese==1 & BP.subnew$time==21 &
>> > BP.subnew$Overweight==0]<-"Obese21"
>> >  BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==21&
>> > BP.subnew$Obese==1]<-"ObeseOverweight21"
>> >  BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==14&
>> > BP.subnew$Obese==1]<-"ObeseOverweight14"
>> > BP.subnew$Categ[BP.subnew$Overweight==0 & BP.subnew$Obese==0
>> > &BP.subnew$time==14]<-"Normal14"
>> >  BP.subnew$Categ[BP.subnew$Overweight==0 & BP.subnew$Obese==0
>> > &BP.subnew$time==21]<-"Normal21"
>> >
>> > BP.subnew$Categ<-factor(BP.subnew$Categ)
>> > BP.subnew$time<-factor(BP.subnew$time)
>> > BP.subnew
>> > #      CODEA hibp21 time Obese Overweight             Categ
>> > #541.1   541      0   14     0          0          Normal14
>> > #543.1   543      0   14     1          0           Obese14
>> > #545.1   545      1   14     0          0          Normal14
>> > #548.1   548      0   14     0          1      Overweight14
>> > #549.1   549      0   14     0          0          Normal14
>> > #541.2   541      0   21     0          0          Normal21
>> > #543.2   543      0   21     1          1 ObeseOverweight21
>> > #545.2   545      1   21     1          1 ObeseOverweight21
>> > #548.2   548      0   21     0          1      Overweight21
>> > #549.2   549      0   21     0          1      Overweight21
>> >
>> > #NOw with the whole dataset:
>> > BP.sub<-BP_2b[,c(1,8:11,13)] #change here and paste the above lines:
>> >  head(BP.subnew)
>> >     # CODEA hibp21 time Obese Overweight    Categ
>> > #3.1      3      0   14     0          0 Normal14
>> > #7.1      7      0   14     0          0 Normal14
>> > #8.1      8      0   14     0          0 Normal14
>> > #9.1      9      0   14     0          0 Normal14
>> > #14.1    14      1   14     0          0 Normal14
>> > #21.1    21      0   14     0          0 Normal14
>> >
>> > tail(BP.subnew)
>> >   #     CODEA hibp21 time Obese Overweight             Categ
>> > #8485.2  8485      0   21     1          1 ObeseOverweight21
>> > #8506.2  8506      0   21     0          1      Overweight21
>> > #8520.2  8520      0   21     0          0          Normal21
>> > #8529.2  8529      1   21     1          1 ObeseOverweight21
>> > #8550.2  8550      0   21     1          1 ObeseOverweight21
>> > #8554.2  8554      0   21     0          0          Normal21
>> >
>> > summary(lme.1 <- lme(hibp21~time+Categ+ time*Categ,
>> > data=BP.subnew,random=~1|CODEA, na.action=na.omit))
>> > #Error in MEEM(object, conLin, control$niterEM) :
>> >   #Singularity in backsolve at level 0, block 1
>> > #May be because of the reasons I mentioned above.
>> >
>> > #YOu didn't mention the library(gee)
>> > BP.gee8 <- gee(hibp21~time+Categ+time*Categ,
>> > data=BP.subnew,id=CODEA,family=binomial,
>> > corstr="exchangeable",na.action=na.omit)
>> > #Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
>> > #Error in gee(hibp21 ~ time + Categ + time * Categ, data = BP.subnew, id
>> =
>> > CODEA,  :
>> >   #rank-deficient model matrix
>> > With your codes, it might have worked, but the results may be inaccurate
>> > # After running your whole codes:
>> >  BP.gee8 <- gee(hibp21~time+Categ+time*Categ,
>> > data=BP.stack3,id=CODEA,family=binomial,
>> > corstr="exchangeable",na.action=na.omit)
>> > #Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
>> > #running glm to get initial regression estimate
>> >    #        (Intercept)                   time           CategObese14
>> >      #    -2.456607e+01           9.940875e-15           2.087584e-13
>> >     # CategOverweight14      time:CategObese14 time:CategOverweight14
>> >       #    2.087584e-13          -9.940875e-15          -9.940875e-15
>> > #Error in gee(hibp21 ~ time + Categ + time * Categ, data = BP.stack3, id
>> =
>> > CODEA,  :
>> >  # Cgee: error: logistic model for probability has fitted value very
>> close
>> > to 1.
>> > #estimates diverging; iteration terminated.
>> >
>> > In short, I think it would be better to go with the suggestion in my
>> > previous email with adequate changes in "Categ" variable (adding
>> > ObeseOverweight14, ObeseOverweight21 etc) as I showed here.
>> >
>> > A.K.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ------------------------------
>> >  If you reply to this email, your message will be added to the
>> discussion
>> > below:
>> >
>>
>> > .
>> > NAML<
>> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654776.html
>> Sent from the R help mailing list archive at Nabble.com.
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=4654795&i=3>mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=4654795&i=4>mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654795.html
>>  To unsubscribe from random effects model, click
>> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4654346&code=dXNoYS5uYXRoYW5AZ21haWwuY29tfDQ2NTQzNDZ8MjAyMjE1NDI0>
>> .
>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654826.html
> Sent from the R help mailing list archive at Nabble.com.
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>