[R] getting data frame rows out of a by object

Douglas Bates bates at stat.wisc.edu
Thu Apr 8 14:25:26 CEST 2004


Ed L Cashin <ecashin at uga.edu> writes:

> Julian Taylor <julian.taylor at adelaide.edu.au> writes:
> 
> ...
> > You are better off using other tools to give you the right subsets. Try
> >
> > d <- do.call("rbind", lapply(split(d, factor(paste(d$a, d$b, sep =
> > ""))), 
> >                             function(el) el[el$c == max(el$c), ]))

Just as a precaution I would use sep=":" or sep="/" or some other
character that is unlikely to occur in the levels of the factors a and
b.[1] to avoid possible ambiguities from using sep="".  If, for
example, both factors a and b had levels of "1" to "11" then using
sep="" you cannot tell if the pasted string "111" originated as "1"
pasted to "11" or "11" pasted to "1".

> That does work, thanks.  I am a bit mystified by the use of paste,
> factor, and split together like that.  By concatenating the columns as
> strings, you are coming up with values that aren't in any one column
> of the data frame, but split doesn't care.  

The point is that factor(paste(...)) returns a factor with length of
nrow(d) and with factor levels determined by the combination of levels
of a and b (provided that you don't get ambiguities as described
above).  The split function does not require that the factor
determining the splits be part of the data frame being split.  It can
be given explicitly as it is here.

[1] If you really want to be cautious you could use an octal
representation like sep="\007" to get a character that is very
unlikely to occur in a factor level.




More information about the R-help mailing list