[R] Two-way Unbalanced multiple sample ANOVA

Wed Mar 7 17:28:47 CET 2007

Hello all,

I was wondering if anyone could help me formulate a Two-way ANOVA for 
unbalanced multiple sample data?

We have a new study method aimed to help students to study for tests 
using computers. (I am a computer scientists, hence my 
soon-to-be-apparent lack of statistical knowledge).

To test this study method we devised a user study where 30 participant 
attended 2 lectures, lecture1 and lecture2. Two test were created, test1 
and test2.

test1 corresponds to the material in lecture1 and test2 corresponds to 
the material in lecture2.

The 30 participants were split into two groups, group1 and group2.

group1 used our new study method to review for lecture1 and their 
existing study method to review the material from lecture2
group2 used our new study method to review for lecture2 and their 
existing study method to review the material from lecture1

Each group then took the two test.

This is a repeated measure experiment because we have 2 exam scores for 
each participant, one using our new method to study and one not using 
our new method to study.

The data is unbalanced because participants did not take the same test 
twice.

 From what I understand balanced data would look like
ID    TEST     SYSTEM     SCORE
1       1        1         80
1       1        0         70
1       2        1         90
1       2        0         95
2       1        1         70
2       1        0         75
2       2        1         80
2       2        0         75

But instead our data look like this:
ID    TEST     SYSTEM     SCORE
1       1        1         80
1       2        0         95
2       1        0         75
2       2        1         80

So participant 2 never took test1 using our system.

Anyway, I want to look to see if our new study method had an impact one 
test results. Also, I want to see if the test number had an impact on 
the exam results.

Here is some sample data:

------------
 >dataSet <- data.frame(
    particID=factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8)),
    whichExam=factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)),
    studyMethod=factor(c(1,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1)),
    score=c(90,80,75,70,70,58,73,68,69,87,68,79,80,80,99,95))
------------

 From what I have read this should be how to compute and ANOVA on this data:

------------
 > summary(aov(score~whichExam*studyMethod+Error(particID),data=dataSet))

Error: particID
                      Df  Sum Sq Mean Sq F value Pr(>F)
whichExam:studyMethod  1  333.06  333.06  1.8211 0.2259
Residuals              6 1097.38  182.90              

Error: Within
            Df  Sum Sq Mean Sq F value  Pr(>F) 
whichExam    1   3.062   3.062  0.1072 0.75445 
studyMethod  1 203.062 203.062  7.1094 0.03721 *
Residuals    6 171.375  28.562                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------

Is this correct way do do an ANOVA test for this data?
 From what I can tell this means that the study method did have a 
statistically significant impact on the scores, is that correct? This 
also shows that it did not matter which test the subject took, meaning 
that the two test were equally difficult.

What exactly do the titles "Error ..." mean?
What are "Residuals"?

Can anyone recommend a good book on R which covers this information, all 
I can find are books on SPSS?