[Rd] vctrs: a type system for the tidyverse

Hadley Wickham h@wickh@m @ending from gm@il@com
Wed Aug 8 19:40:48 CEST 2018


>> So we say that a
>> factor `x` has finer resolution than factor `y` if the levels of `y`
>> are contained in `x`. So to find the common type of two factors, we
>> take the union of the levels of each factor, given a factor that has
>> finer resolution than both.
>
> I'm not so sure. I think a more useful definition of resolution may be
> that it is about increasing the precision of information. In that case,
> a factor with 4 levels each of which is present has a higher resolution
> than the same data with additional-but-absent levels on the factor object.
> Now that may be different when the the new levels are not absent, but
> my point is that its not clear to me that resolution is a useful way of
> talking about factors.

An alternative way of framing factors is that they're about tracking
possible values, particular possible values that don't exist in the
data that you have. Thinking about factors in that way, makes unioning
the levels more natural.

> If users want unrestricted character type behavior, then IMHO they should
> just be using characters, and it's quite easy for them to do so in any case
> I can easily think of where they have somehow gotten their hands on a factor.
> If, however, they want a factor, it must be - I imagine - because they actually
> want the the semantics and behavior specific to factors.

I think this is true in the tidyverse, which will never give you a
factor unless you explicitly ask for one, but the default in base R
(at least as soon as a data frame is involved) is to turn character
vectors into factors.

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list