[R] ave(x, y, FUN=length) produces character output when x is character

Mike Miller mbmiller+l at gmail.com
Wed Dec 24 21:39:59 CET 2014


On Wed, 24 Dec 2014, Bert Gunter wrote:

> You said:
> "The elements of the first vector are irrelevant because they are only
> counted, so we should get the same result if it were a character
> vector, but we don't: "
>
> You don't get to invent your own rules! ?ave -- always nice to read the 
> Help docs **before posting** -- clearly states that the x argument must 
> be __numeric__. So if you choose to ignore what you are told, you do so 
> at your own risk. Who knows what you'll get? -- it's a user error, not a 
> bug.

I guess the goal is to humiliate the person who posted the question. 
I've had trouble convincing doctoral students in biostat to post questions 
here because they are afraid of being treated like dirt.  It doesn't 
bother me personally, but I see it as counterproductive.  The code I was 
working with was written by such a student and it has been in CRAN for a 
couple of years.  I'm just trying to fix it.  Your comment is helpful, but 
it would have been even better without the hostile tone.

Regarding the way ave() works -- why doesn't it check that the input 
vector is numeric?  Apparently, integer input is acceptable.  Does numeric 
sometimes mean "numeric" and sometimes "either 'integer' or 'numeric'"? 
Either way, if character is unacceptable, it could throw an error instead 
of pumping out an almost-correct answer.  That made it much harder to 
track down the bug in the code base I was working on.

Also, regarding the sacred text, "x A numeric." is a bit terse.  The same 
text later refers to length(x), so I suspect that "A numeric" is short for 
"A numeric vector", but that might not mean "a vector of 'numeric' type."

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html


> And if (my understanding of) what you say is the case, this whole post 
> is silly. See ?table to do exactly what you claim is wanted without 
> trying to invent square wheels.

table() counts elements but it has to repeat them in the proper pattern.

For every element of a vector we want to know how many times it occurs in 
that vector.  So if the vector is c("A","A","B","C","C","C") the output 
should be c(2,2,1,3,3,3).  I'm sure we all know that table() will count 
the elements, but it doesn't place them in a vector as desired.  I can do 
this with a character vector:

> charvec <- c("A","A","B","C","C","C")
> as.vector(( table( charvec )[charvec] ))
[1] 2 2 1 3 3 3

It's slightly trickier with an integer vector:

> intvec <- c(4,4,5,6,6,6)
> table( intvec )[intvec]
intvec
<NA> <NA> <NA> <NA> <NA> <NA>
   NA   NA   NA   NA   NA   NA
> as.vector(table( intvec )[as.character(intvec)])
[1] 2 2 1 3 3 3

So I think this will always work for vectors of either type:

as.vector(table( as.character(vec) )[as.character(vec)])

To me that looks like the right way to do it.  Think so?

Best,
Mike


> On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:
>> R 3.0.1 on Linux 64...
>>
>> I was working with someone else's code.  They were using ave() in a way that
>> I guess is nonstandard:  Isn't FUN always supposed to be a variant of
>> mean()?  The idea was to count for every element of a factor vector how many
>> times the level of that element occurs in the factor vector.
>>
>>
>> gl() makes a factor:
>>
>>> gl(2,2,5)
>>
>> [1] 1 1 2 2 1
>> Levels: 1 2
>>
>>
>> ave() applies FUN to produce the desired count, and it works:
>>
>>> ave( 1:5, gl(2,2,5), FUN=length )
>>
>> [1] 3 3 2 2 3
>>
>>
>> The elements of the first vector are irrelevant because they are only
>> counted, so we should get the same result if it were a character vector, but
>> we don't:
>>
>>> ave( as.character(1:5), gl(2,2,5), FUN=length )
>>
>> [1] "3" "3" "2" "2" "3"
>>
>> The output has character type, but it is supposed to be a collection of
>> vector lengths.
>>
>>
>> Two questions:
>>
>> (1) Is that a bug in ave()?  It certainly is unexpected.
>>
>> (2) What is the best way to do this sort of thing?
>>
>> The truth is that we start with a character vector and we want to create an
>> integer vector that tells us for every element of the character vector how
>> many times that string occurs.  Here are two vectors of length 6 that should
>> give the same result:
>>
>>> intvec <- c(4,5,6,5,6,6)
>>> charvec <- c("A","B","C","B","C","C")
>>
>>
>> The code was used like this with integer vectors and it seemed to work:
>>
>>> ave( intvec, intvec, FUN=length )
>>
>> [1] 1 2 3 2 3 3
>>
>> When a character vector came along, it would fail by producing a character
>> vector as output:
>>
>>> ave( charvec, charvec, FUN=length )
>>
>> [1] "1" "2" "3" "2" "3" "3"
>>
>> This seems more appropriate, and it might always work, but is it OK?:
>>
>>> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum )
>>
>> [1] 1 2 3 2 3 3
>>
>> I suspect that ave() isn't the best choice, but what is the best way to do
>> this?
>>
>>
>> Thanks in advance.
>>
>> Mike
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list