[R-sig-ME] partly crossed design

Douglas Bates bates at stat.wisc.edu
Tue Jul 24 17:16:44 CEST 2007


> ---------- Forwarded message ----------
> From: Iasonas Lamprianou <lamprianou at yahoo.com>
> To: r-sig-mixed-models at r-project.org
> Date: Tue, 24 Jul 2007 06:38:28 -0700 (PDT)
> Subject: partly crossed design

> Hi all,

> I have a dataset (attached) which shows the score of pupils on
> different subjects. For example, pupil 13 may have been tested on
> subjects 1 (Greek), subject 4 (History) and 5 (Latin) whereas pupil
> 124 may have been tested on subjects 1 (Greek), 38 (Physics) and 19
> (Chemistry). All pupils took Greek (subject with code 1).

Apparently not.  There are 9409 student id's and only 9403 scores for Greek.

It would help if you could give us the complete set of codes so the
labels could be changed to Greek, History, Latin, etc. instead of 9
arbitrary numbers.

> I attach the
> file with some data. Below, I show some of the R commands that I used.

> My intention is to estimate which of the subjects was more difficult,
> and to estimate the % of variance because of pupils and because of
> subjects.

The coefficients in the model that I fit to these data are typical
scores on the subjects, adjusted for student abilities, as measured by
the random effects.

> Please give me some help, also send me any commands you may
> want to suggest and some explanations. I realized that SPSS 15 has
> included a new mixed models component, is it much better than R?

I'll let others handle that question. :-)

With this message I enclose another transcript that contains a model
fit with random effects for subject and for student.   This would be
another way of assessing difficulty of subject adjusted for student.
In this case there are partially crossed random effects.  (BTW, I
would be interested in the results from SPSS v15 on such a model and
in how long it takes to fit such a model. I should say that I am using
the development version of the lme4 package in fitting these models.
If I were to use the released version of the package I would replace
lmer by lmer2.)

...

> this is the model for the analysis I used, but how do I show that the
> pupils take more than one test?

I'm not sure what you mean by that.  You could tabulate the table of
the number of observations per student as shown in this transcript or
you can show the results from a selection of students as was done
here.

> mod2 <- lmer(arxiki ~ mathima + (1|app_aa), Dataset)

> how about mod3 <- lmer(arxiki ~ 1 + (1|mathima) + (1|mathima:app_aa), Dataset)

That won't work because the number of levels of the mathima:app_aa
interaction is the same as the number of observations so the
per-observation noise term is completely confounded with the random
effect you are trying to specify.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sdata.pdf
Type: application/pdf
Size: 25938 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20070724/c727dac5/attachment.pdf>
-------------- next part --------------

R version 2.6.0 Under development (unstable) (2007-07-24 r42297)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> load(url("http://www.stat.wisc.edu/~bates/sdata.rda"))
> pdf(file = "sdata.pdf")
> str(sdata)
'data.frame':	25120 obs. of  3 variables:
 $ app_aa : Factor w/ 9409 levels "1","10","1,000",..: 1 1093 2178 4362 5448 6536 7628 8721 2 109 ...
 $ mathima: Factor w/ 9 levels "1","19","21",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ score  : int  107 43 47 26 37 112 51 72 88 83 ...
Warning message:
closing unused connection 3 (gzcon(http://www.stat.wisc.edu/~bates/sdata.rda)) 
> xtabs(~ mathima, sdata)
mathima
   1   19   21   37   38    4   43    5    6 
9403  448 1659 3204 1834 1736 5019  177 1640 
> with(sdata, table(table(app_aa)))

   1    2    3    4    5 
1104 2860 3542 1845   58 
> set.seed(123454321)
> subs <- subset(sdata, app_aa %in% sample(levels(app_aa), 20))
> subs[order(subs$app_aa),]
      app_aa mathima score
1151   1,173       1    14
1200   1,223       1    84
13616  1,223      21    10
15454  1,223      37    41
18486  1,223      38    23
1998   2,036       1   110
9766   2,036       4    74
21207  2,036      43   130
2082   2,124       1    77
11683  2,124       6   129
21250  2,124      43   130
2526   2,580       1    69
15912  2,580      37    24
2929   2,990       1    97
21706  2,990      43   128
3202   3,267       1    82
21844  3,267      43   110
3296   3,363       1    44
21894  3,363      43    87
3657   3,731       1    94
16297  3,731      37   104
18985  3,731      38   104
3764   3,841       1     8
22138  3,841      43    21
3845   3,923       1    75
16373  3,923      37   106
19028  3,923      38   108
5982   6,108       1     5
7061   7,209       1    71
12567  7,209       6    97
23881  7,209      43    23
8182   8,346       1   141
10906  8,346       4   147
24477  8,346      43   191
8320   8,486       1    12
8401   8,567       1    35
24591  8,567      43    31
850      867       1    19
20580    867      43    40
8790   8,965       1   146
11024  8,965       4   170
24794  8,965      43   197
9086   9,261       1    98
18153  9,261      37    77
20042  9,261      38   106
9107   9,282       1    43
11084  9,282       4    27
12914  9,282       6   124
24966  9,282      43    14
> densityplot(~ score, sdata, plot.points = FALSE)
> densityplot(~ score|mathima, sdata, plot.points = FALSE)
> system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa),
+                        sdata, control = list(msV = 1)))
  0:     258960.61: 0.999416
  1:     258081.58:  1.99942
  2:     257669.40:  1.73209
  3:     257601.36:  1.48965
  4:     257583.86:  1.57219
  5:     257583.55:  1.56285
  6:     257583.54:  1.56229
  7:     257583.54:  1.56230
   user  system elapsed 
  2.220   0.036   2.255 
> system.time(m1 <- lmer(score ~ 0 + mathima + (1|app_aa), sdata))
   user  system elapsed 
  0.904   0.036   0.942 
> m1
Linear mixed-effects model fit by REML 
Formula: score ~ 0 + mathima + (1 | app_aa) 
   Data: sdata 
    AIC    BIC  logLik MLdeviance REMLdeviance
 257604 257685 -128792     257597       257584
Random effects:
 Groups   Name Variance Std.Dev.
 app_aa        1947.05  44.125  
 Residual       797.84  28.246  
Number of obs: 25120, groups: app_aa, 9409

Fixed effects:
          Estimate Std. Error t value
mathima1   81.0323     0.5402  150.01
mathima19  75.0738     1.5907   47.20
mathima21  86.2260     0.9119   94.55
mathima37  86.0341     0.7428  115.82
mathima38  91.0371     0.8998  101.18
mathima4   80.6845     0.9060   89.05
mathima43 109.5310     0.6398  171.21
mathima5  118.4503     2.4699   47.96
mathima6  126.3008     0.9205  137.20

Correlation of Fixed Effects:
          mathm1 mthm19 mthm21 mthm37 mthm38 mathm4 mthm43 mathm5
mathima19 0.241                                                  
mathima21 0.420  0.206                                           
mathima37 0.516  0.243  0.341                                    
mathima38 0.426  0.198  0.291  0.455                             
mathima4  0.423  0.113  0.230  0.242  0.194                      
mathima43 0.599  0.167  0.351  0.338  0.276  0.423               
mathima5  0.155  0.040  0.078  0.085  0.069  0.165  0.157        
mathima6  0.416  0.118  0.229  0.274  0.206  0.308  0.391  0.094 
> system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa),
+                        sdata, control = list(msV = 1)))
  0:     260067.68: 0.0309098 0.999416
  1:     258490.09:  1.02452  1.11227
  2:     257665.35:  1.02320  1.61227
  3:     257658.21:  1.02195  1.57729
  4:     257657.49:  1.02005  1.56186
  5:     257657.48:  1.01875  1.56236
  6:     257657.32: 0.996976  1.56620
  7:     257657.03: 0.948383  1.57035
  8:     257656.33: 0.801040  1.57692
  9:     257655.81: 0.645518  1.57769
 10:     257655.25: 0.660122  1.57025
 11:     257655.04: 0.636126  1.56172
 12:     257655.04: 0.654799  1.56199
 13:     257655.03: 0.649502  1.56216
 14:     257655.03: 0.649233  1.56214
   user  system elapsed 
  1.261   0.008   1.266 
> system.time(m2 <- lmer(score ~ 1 + (1|mathima) + (1|app_aa), sdata))
   user  system elapsed 
  1.236   0.020   1.258 
> m2
Linear mixed-effects model fit by REML 
Formula: score ~ 1 + (1 | mathima) + (1 | app_aa) 
   Data: sdata 
    AIC    BIC  logLik MLdeviance REMLdeviance
 257661 257685 -128828     257661       257655
Random effects:
 Groups   Name Variance Std.Dev.
 mathima        336.32  18.339  
 app_aa        1946.80  44.123  
 Residual       797.89  28.247  
Number of obs: 25120, groups: mathima, 9; app_aa, 9409

Fixed effects:
            Estimate Std. Error t value
(Intercept)   94.902      6.141   15.45
> ranef(m2)$mathima
   (Intercept)
1   -13.865747
19  -19.678775
21   -8.649562
37   -8.845464
38   -3.841410
4   -14.212544
43   14.613201
5    23.142992
6    31.337310
> 
> 
> 
> 
> proc.time()
   user  system elapsed 
 19.637   0.320  19.987 


More information about the R-sig-mixed-models mailing list