[R-sig-ME] partly crossed design
Douglas Bates
bates at stat.wisc.edu
Tue Jul 24 17:16:44 CEST 2007
> ---------- Forwarded message ----------
> From: Iasonas Lamprianou <lamprianou at yahoo.com>
> To: r-sig-mixed-models at r-project.org
> Date: Tue, 24 Jul 2007 06:38:28 -0700 (PDT)
> Subject: partly crossed design
> Hi all,
> I have a dataset (attached) which shows the score of pupils on
> different subjects. For example, pupil 13 may have been tested on
> subjects 1 (Greek), subject 4 (History) and 5 (Latin) whereas pupil
> 124 may have been tested on subjects 1 (Greek), 38 (Physics) and 19
> (Chemistry). All pupils took Greek (subject with code 1).
Apparently not. There are 9409 student id's and only 9403 scores for Greek.
It would help if you could give us the complete set of codes so the
labels could be changed to Greek, History, Latin, etc. instead of 9
arbitrary numbers.
> I attach the
> file with some data. Below, I show some of the R commands that I used.
> My intention is to estimate which of the subjects was more difficult,
> and to estimate the % of variance because of pupils and because of
> subjects.
The coefficients in the model that I fit to these data are typical
scores on the subjects, adjusted for student abilities, as measured by
the random effects.
> Please give me some help, also send me any commands you may
> want to suggest and some explanations. I realized that SPSS 15 has
> included a new mixed models component, is it much better than R?
I'll let others handle that question. :-)
With this message I enclose another transcript that contains a model
fit with random effects for subject and for student. This would be
another way of assessing difficulty of subject adjusted for student.
In this case there are partially crossed random effects. (BTW, I
would be interested in the results from SPSS v15 on such a model and
in how long it takes to fit such a model. I should say that I am using
the development version of the lme4 package in fitting these models.
If I were to use the released version of the package I would replace
lmer by lmer2.)
...
> this is the model for the analysis I used, but how do I show that the
> pupils take more than one test?
I'm not sure what you mean by that. You could tabulate the table of
the number of observations per student as shown in this transcript or
you can show the results from a selection of students as was done
here.
> mod2 <- lmer(arxiki ~ mathima + (1|app_aa), Dataset)
> how about mod3 <- lmer(arxiki ~ 1 + (1|mathima) + (1|mathima:app_aa), Dataset)
That won't work because the number of levels of the mathima:app_aa
interaction is the same as the number of observations so the
per-observation noise term is completely confounded with the random
effect you are trying to specify.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sdata.pdf
Type: application/pdf
Size: 25938 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20070724/c727dac5/attachment.pdf>
-------------- next part --------------
R version 2.6.0 Under development (unstable) (2007-07-24 r42297)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> load(url("http://www.stat.wisc.edu/~bates/sdata.rda"))
> pdf(file = "sdata.pdf")
> str(sdata)
'data.frame': 25120 obs. of 3 variables:
$ app_aa : Factor w/ 9409 levels "1","10","1,000",..: 1 1093 2178 4362 5448 6536 7628 8721 2 109 ...
$ mathima: Factor w/ 9 levels "1","19","21",..: 1 1 1 1 1 1 1 1 1 1 ...
$ score : int 107 43 47 26 37 112 51 72 88 83 ...
Warning message:
closing unused connection 3 (gzcon(http://www.stat.wisc.edu/~bates/sdata.rda))
> xtabs(~ mathima, sdata)
mathima
1 19 21 37 38 4 43 5 6
9403 448 1659 3204 1834 1736 5019 177 1640
> with(sdata, table(table(app_aa)))
1 2 3 4 5
1104 2860 3542 1845 58
> set.seed(123454321)
> subs <- subset(sdata, app_aa %in% sample(levels(app_aa), 20))
> subs[order(subs$app_aa),]
app_aa mathima score
1151 1,173 1 14
1200 1,223 1 84
13616 1,223 21 10
15454 1,223 37 41
18486 1,223 38 23
1998 2,036 1 110
9766 2,036 4 74
21207 2,036 43 130
2082 2,124 1 77
11683 2,124 6 129
21250 2,124 43 130
2526 2,580 1 69
15912 2,580 37 24
2929 2,990 1 97
21706 2,990 43 128
3202 3,267 1 82
21844 3,267 43 110
3296 3,363 1 44
21894 3,363 43 87
3657 3,731 1 94
16297 3,731 37 104
18985 3,731 38 104
3764 3,841 1 8
22138 3,841 43 21
3845 3,923 1 75
16373 3,923 37 106
19028 3,923 38 108
5982 6,108 1 5
7061 7,209 1 71
12567 7,209 6 97
23881 7,209 43 23
8182 8,346 1 141
10906 8,346 4 147
24477 8,346 43 191
8320 8,486 1 12
8401 8,567 1 35
24591 8,567 43 31
850 867 1 19
20580 867 43 40
8790 8,965 1 146
11024 8,965 4 170
24794 8,965 43 197
9086 9,261 1 98
18153 9,261 37 77
20042 9,261 38 106
9107 9,282 1 43
11084 9,282 4 27
12914 9,282 6 124
24966 9,282 43 14
> densityplot(~ score, sdata, plot.points = FALSE)
> densityplot(~ score|mathima, sdata, plot.points = FALSE)
> system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa),
+ sdata, control = list(msV = 1)))
0: 258960.61: 0.999416
1: 258081.58: 1.99942
2: 257669.40: 1.73209
3: 257601.36: 1.48965
4: 257583.86: 1.57219
5: 257583.55: 1.56285
6: 257583.54: 1.56229
7: 257583.54: 1.56230
user system elapsed
2.220 0.036 2.255
> system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa), sdata))
user system elapsed
0.904 0.036 0.942
> m1
Linear mixed-effects model fit by REML
Formula: score ~ 0 + mathima + (1 | app_aa)
Data: sdata
AIC BIC logLik MLdeviance REMLdeviance
257604 257685 -128792 257597 257584
Random effects:
Groups Name Variance Std.Dev.
app_aa 1947.05 44.125
Residual 797.84 28.246
Number of obs: 25120, groups: app_aa, 9409
Fixed effects:
Estimate Std. Error t value
mathima1 81.0323 0.5402 150.01
mathima19 75.0738 1.5907 47.20
mathima21 86.2260 0.9119 94.55
mathima37 86.0341 0.7428 115.82
mathima38 91.0371 0.8998 101.18
mathima4 80.6845 0.9060 89.05
mathima43 109.5310 0.6398 171.21
mathima5 118.4503 2.4699 47.96
mathima6 126.3008 0.9205 137.20
Correlation of Fixed Effects:
mathm1 mthm19 mthm21 mthm37 mthm38 mathm4 mthm43 mathm5
mathima19 0.241
mathima21 0.420 0.206
mathima37 0.516 0.243 0.341
mathima38 0.426 0.198 0.291 0.455
mathima4 0.423 0.113 0.230 0.242 0.194
mathima43 0.599 0.167 0.351 0.338 0.276 0.423
mathima5 0.155 0.040 0.078 0.085 0.069 0.165 0.157
mathima6 0.416 0.118 0.229 0.274 0.206 0.308 0.391 0.094
> system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa),
+ sdata, control = list(msV = 1)))
0: 260067.68: 0.0309098 0.999416
1: 258490.09: 1.02452 1.11227
2: 257665.35: 1.02320 1.61227
3: 257658.21: 1.02195 1.57729
4: 257657.49: 1.02005 1.56186
5: 257657.48: 1.01875 1.56236
6: 257657.32: 0.996976 1.56620
7: 257657.03: 0.948383 1.57035
8: 257656.33: 0.801040 1.57692
9: 257655.81: 0.645518 1.57769
10: 257655.25: 0.660122 1.57025
11: 257655.04: 0.636126 1.56172
12: 257655.04: 0.654799 1.56199
13: 257655.03: 0.649502 1.56216
14: 257655.03: 0.649233 1.56214
user system elapsed
1.261 0.008 1.266
> system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa), sdata))
user system elapsed
1.236 0.020 1.258
> m2
Linear mixed-effects model fit by REML
Formula: score ~ 1 + (1 | mathima) + (1 | app_aa)
Data: sdata
AIC BIC logLik MLdeviance REMLdeviance
257661 257685 -128828 257661 257655
Random effects:
Groups Name Variance Std.Dev.
mathima 336.32 18.339
app_aa 1946.80 44.123
Residual 797.89 28.247
Number of obs: 25120, groups: mathima, 9; app_aa, 9409
Fixed effects:
Estimate Std. Error t value
(Intercept) 94.902 6.141 15.45
> ranef(m2)$mathima
(Intercept)
1 -13.865747
19 -19.678775
21 -8.649562
37 -8.845464
38 -3.841410
4 -14.212544
43 14.613201
5 23.142992
6 31.337310
>
>
>
>
> proc.time()
user system elapsed
19.637 0.320 19.987
More information about the R-sig-mixed-models
mailing list