[R] Re: [S] LME - fixed effects model and missing values
Jacob Wegelin
jawegelin at ucdavis.edu
Wed Nov 5 01:16:32 CET 2003
Here is an answer to a 1999 post. I didn't find a direct answer anywhere
on the Web, perhaps because it is "obvious" once one sees it.
Suppose you have data from a longitudinal study, where each subject was
measured *up to* four times, with missing measurements, so that the data
look like this:
> MAT<- structure(c(23, 24, 6, 19, 16, 20, 13, 11, NA, 8, NA, 21, 19, 15,
11, NA, 10, 12, 16, 30, 13, 16, NA, NA, NA, NA, 28, 6), .Dim = c(7, 4),
.Dimnames = list(c("SUB1", "SUB2", "SUB3", "SUB4", "SUB5", "SUB6",
"SUB7"), c("BDI0", "BDI3", "BDI6", "BDI12")))
> MAT
BDI0 BDI3 BDI6 BDI12
SUB1 23 11 11 16
SUB2 24 NA NA NA
SUB3 6 8 10 NA
SUB4 19 NA 12 NA
SUB5 16 21 16 NA
SUB6 20 19 30 28
SUB7 13 15 13 6
Let
> x<- c(0,3,6,12)
Then it's a simple matter to regress each row of MAT on x, in spite of
missing values. Take for instance a row that has three missing values:
> lm(MAT["SUB2",] ~ x)
Call:
lm(formula = MAT["SUB2", ] ~ x)
Coefficients:
(Intercept) x
24 NA
But now suppose we want to fit a linear mixed-effects model, borrowing
strength from all the data in estimating each individual's intercept
and slope.
As in the naive lm approach, NAs are not a problem. Unlike the naive lm
approach, however, we obtain values for slope and intercept even for
subjects for whom we cannot get slope via the lm approach, such as SUB2.
First, you must create a data frame with one row for each time point
within each subject, and you need to create a column to keep track of the
time variable. For instance:
d<-data.frame(sub=I(rep("", length(MAT))), time=rep(NA, length(MAT)), BDI=rep(NA, length(MAT)) )
count<-0
for (i in 1:nrow(MAT)) for (j in 1:ncol(MAT)) {
count<-count+1
d[count, "sub"]<- dimnames(MAT) [[1]][i]
d[count, "time"]<- c(0,3,6,12) [j]
d[count, "BDI"]<- MAT[i,j]
}
d$sub<-factor(d$sub)
Then you call lme, telling it to throw away rows with NAs in them. Note
that this is *not* complete case analysis. We are not throwing away
subjects (cases) who happen to have a missing BDI, just the time points
that have missing BDI.
> library("nlme")
> mymodel<-lme( BDI ~ time, random=~ 1 + time | sub, data=d, na.action=na.omit)
> mymodel
Linear mixed-effects model fit by REML
Data: d
Log-restricted-likelihood: -63.70296
Fixed: BDI ~ time
(Intercept) time
16.8317095 -0.1449387
Random effects:
Formula: ~1 + time | sub
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3.5621406 (Intr)
time 0.4671455 0.886
Residual 4.1972943
Number of Observations: 21
Number of Groups: 7
We even get a slope for SUB2, for whom we have only one observation.
> mymodel$coef$random
$sub
(Intercept) time
SUB1 -0.4909769 -0.08181262
SUB2 3.0012854 0.34872811
SUB3 -4.5733486 -0.52161792
SUB4 -0.7908920 -0.13009388
SUB5 0.7516454 0.09528318
SUB6 4.4620873 0.61630713
SUB7 -2.3598006 -0.32679398
(Version: R 1.8.0 on Windows.)
This responds to the following email, which may be found at
http://www.biostat.wustl.edu/archives/html/s-news/1999-01/msg00122.html
Jake Wegelin
On Wed, 20 Jan 1999, ziv shkedy wrote:
>
> Dear Jose and all the other S+ users,
> First I want to thank you and Prof. Brian Ripley for your helpful answers.
>
> I'll described the missing values problem in more details.
> I'm modeling longitudinal data where each one of the subjects was measured
> at few time points.
> Some of the subjects dropout from the study, i.e. that the RESPONSE of these
> subjects is missing.
> The analysis in proc MIXED is likelihood based ignorable analysis (i.e. we
> using all available cases at each time point),
> in lme ,if i understood you correctly, if the response is missing then the
> observation is omitted from the dataset. This is a complete case analysis.
> My question is if there is a way to overcome this problem and to use all
> observation at each time point (and in that way the output from proc MIXED
> and lme will be the same).
>
> Now, I'm on my way to get the gls function.
> Thank you once again for your help, Ziv.
>
> =========================================================
> Ziv Shkedy
> Biostatistics
> Center for Statistics
> Limburgs Universitair Centrum
> Universitaire Campus, department WNI
> B-3590 Diepenbeek, Belgium
> Tel: +32-(0)11-26.82.57
> e-mail: ziv.shkedy at luc.ac.be
> =========================================================
>
More information about the R-help
mailing list