[R] ave(x, y, FUN=length) produces character output when x is character

Mike Miller mbmiller+l at gmail.com
Wed Dec 24 20:30:41 CET 2014


R 3.0.1 on Linux 64...

I was working with someone else's code.  They were using ave() in a way 
that I guess is nonstandard:  Isn't FUN always supposed to be a variant of 
mean()?  The idea was to count for every element of a factor vector how 
many times the level of that element occurs in the factor vector.


gl() makes a factor:

> gl(2,2,5)
[1] 1 1 2 2 1
Levels: 1 2


ave() applies FUN to produce the desired count, and it works:

> ave( 1:5, gl(2,2,5), FUN=length )
[1] 3 3 2 2 3


The elements of the first vector are irrelevant because they are only 
counted, so we should get the same result if it were a character vector, 
but we don't:

> ave( as.character(1:5), gl(2,2,5), FUN=length )
[1] "3" "3" "2" "2" "3"

The output has character type, but it is supposed to be a collection of 
vector lengths.


Two questions:

(1) Is that a bug in ave()?  It certainly is unexpected.

(2) What is the best way to do this sort of thing?

The truth is that we start with a character vector and we want to create 
an integer vector that tells us for every element of the character vector 
how many times that string occurs.  Here are two vectors of length 6 that 
should give the same result:

> intvec <- c(4,5,6,5,6,6)
> charvec <- c("A","B","C","B","C","C")

The code was used like this with integer vectors and it seemed to work:

> ave( intvec, intvec, FUN=length )
[1] 1 2 3 2 3 3

When a character vector came along, it would fail by producing a character 
vector as output:

> ave( charvec, charvec, FUN=length )
[1] "1" "2" "3" "2" "3" "3"

This seems more appropriate, and it might always work, but is it OK?:

> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum )
[1] 1 2 3 2 3 3

I suspect that ave() isn't the best choice, but what is the best way to do 
this?


Thanks in advance.

Mike



More information about the R-help mailing list