[R] using tapply with multiple variables

Jim Lemon jim at bitwrit.com.au
Sun May 1 14:29:51 CEST 2011

On 05/01/2011 05:28 AM, Kevin Burnham wrote:
> HI All,
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime.  For each of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest.  For each item the subject got correct, there is a 'C' in
> the column 'Correct'.  The line:
> tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
> gives me the sum of correct answers for each subject.
> However, I would like to have that sum separated by Time (pre or post).  Is
> there a simple way to do that?
> What if I further wish to separate by Group (T or C)?
Hi Kevin,
When I looked at this, I immediately thought of the brkdnNest function 
(which uses tapply internally). In order to get the counts with the 
current function, I had to create a new variable (newcorrect). However, 
the idea so attracted me that I programmed it into the code (thanks).
Here is a way to get your summary by Subject and Time:

  function(x) sum(x=="C"))

To get the three level breakdown, add another factor:


Notice that this gives you all of the subjects for each Group, even if 
they weren't in that Group. I'll work on that one, for I have just 
switched to using "tapply" for this breakdown, as it doesn't discard NA 
values (the cause of the minor bug in barNest)


More information about the R-help mailing list