[R] tidyverse: read_csv() misses column
Rich Shepard
r@hep@rd @end|ng |rom @pp|-eco@y@@com
Mon Nov 1 18:54:58 CET 2021
On Mon, 1 Nov 2021, Bill Dunlap wrote:
> Use the col_type argument to specify your column types. [Why would you
> expect '2009' to be read as a string instead of a number?]. It looks like
> an initial zero causes an otherwise numeric looking entry to be considered
> a string (handy for zip codes in the northeastern US).
> help(read_csv) says the column type guessing is "not robust" and its
> algorithm doesn't seem to be documented in the help file:
Bill,
That makes sense. I read that in the book and forgot about it. I'll specify
the col_type for each column in the read_csv() function.
Specifying column names got me much closer:
> cor_disc <- read_csv("../data/cor-disc.csv", col_names = TRUE, col_types = c("c","c","c","c","c","c","i"))
> cor_disc
# A tibble: 415,263 × 8
site_nbr year mon day hr min tz disc
<chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 14171600 2009 10 23 00 00 PDT 8750
2 14171600 2009 10 23 00 15 PDT 8750
3 14171600 2009 10 23 00 30 PDT 8750
4 14171600 2009 10 23 00 45 PDT 8750
5 14171600 2009 10 23 01 00 PDT 8750
6 14171600 2009 10 23 01 15 PDT 8750
7 14171600 2009 10 23 01 30 PDT 8750
8 14171600 2009 10 23 01 45 PDT 8730
9 14171600 2009 10 23 02 00 PDT 8730
10 14171600 2009 10 23 02 15 PDT 8730
# … with 415,253 more rows
The col_types for year was specified as "c", for disc as "i" but both are
input as doubles. That's a non-issue for disc (discharge in fps), but year
is a character as are months, days, etc.
Have I still missed something in specifying column types?
Regards,
Rich
More information about the R-help
mailing list