[Rd] vctrs: a type system for the tidyverse

Hadley Wickham h@wickh@m @ending from gm@il@com
Wed Aug 8 19:37:05 CEST 2018


>     > I now have a better argument, I think:
>
>     > If you squint your brain a little, I think you can see
>     > that each set of automatic coercions is about increasing
>     > resolution. Integers are low resolution versions of
>     > doubles, and dates are low resolution versions of
>     > date-times. Logicals are low resolution version of
>     > integers because there's a strong convention that `TRUE`
>     > and `FALSE` can be used interchangeably with `1` and `0`.
>
>     > But what is the resolution of a factor? We must take a
>     > somewhat pragmatic approach because base R often converts
>     > character vectors to factors, and we don't want to be
>     > burdensome to users. So we say that a factor `x` has finer
>     > resolution than factor `y` if the levels of `y` are
>     > contained in `x`. So to find the common type of two
>     > factors, we take the union of the levels of each factor,
>     > given a factor that has finer resolution than
>     > both. Finally, you can think of a character vector as a
>     > factor with every possible level, so factors and character
>     > vectors are coercible.
>
>     > (extracted from the in-progress vignette explaining how to
>     > extend vctrs to work with your own vctrs, now that vctrs
>     > has been rewritten to use double dispatch)
>
> I like this argumentation, and find it very nice indeed!
> It confirms my own gut feeling which had lead me to agreeing
> with you, Hadley, that taking the union of all factor levels
> should be done here.

That's great to hear :)

> As Gabe mentioned (and you've explained about) the term "type"
> is really confusing here.  As you know, the R internals are all
> about SEXPs, TYPEOF(), etc, and that's what the R level
> typeof(.) also returns.  As you want to use something slightly
> different, it should be different naming, ideally something not
> existing yet in the R / S world, maybe 'kind' ?

Agreed - I've been using type in the sense of "type system"
(particularly as it related to algebraic data types), but that's not
obvious from the current presentation, and as you note, is confusing
with existing notions of type in R. I like your suggestion of kind,
but I think it might be possible to just talk about classes, and
instead emphasise that while the components of the system are classes
(and indeed it's implemented using S3), the coercion/casting
relationship do not strictly follow the subclass/superclass
relationships.

A good motivating example is now ordered vs factor - I don't think you
can say that ordered or factor have greater resolution than the other
so:

vec_c(factor("a"), ordered("a"))
#> Error: No common type for factor and ordered

This is not what you'd expect from an _object_ system since ordered is
a subclass of factor.

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list