[R] readr to generate tibble from a character matrix
Ben Tupper
btupper at bigelow.org
Fri Apr 7 15:08:27 CEST 2017
Thanks!
I made up a little test for converting from character matrix to tibble: dumping to file and reading back, pasting up a big string, using pipes, using as.data.frame and using a pipeless version. By far and away it is worth using Ulrik's or your solution compared to dumping the matrix to a file and then reading back OR pasting the matrix into one honking big string.
There is a difference in how the various methods interpret date-time inputs, but otherwise the results are all identical.
Cheers,
Ben
#### START
library(nycflights13)
library(tibble)
library(magrittr)
library(readr)
library(microbenchmark)
m <- as.matrix(flights)
via_file <- function(m){
filename = tempfile(fileext = '.csv')
write.csv(m, file = filename, row.names = FALSE, quote = FALSE)
readr::read_csv(filename)
}
via_paste <- function(m){
s <- paste(
c(paste(colnames(m), collapse = ","), apply(m, 1, paste, collapse = ",")),
collapse = "\n")
readr::read_csv(s)
}
via_pipes <- function(m){
m %>%
tibble::as_tibble() %>%
lapply(type.convert, as.is = TRUE) %>%
tibble::as_tibble()
}
via_dataframe <- function(m){
mm <- lapply(data.frame(m, stringsAsFactors=FALSE), type.convert, as.is=TRUE)
tibble::as_tibble(mm)
}
via_pipeless <- function(m){
tibble::as_tibble(lapply(tibble::as_tibble(m), type.convert, as.is=TRUE))
}
X <- list(
file=via_file(m),
paste=via_paste(m),
pipes=via_pipes(m),
dataframe=via_dataframe(m),
pipeless=via_pipeless(m))
sapply(names(X), function(n) all.equal(X[[n]], X[[1]]))
# $file
# [1] TRUE
# $paste
# [1] TRUE
# $pipes
# [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt"
# $dataframe
# [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt"
# $pipeless
# [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt"
microbenchmark(
via_file(m),
via_paste(m),
via_pipes(m),
via_dataframe(m),
via_pipeless(m),
times = 5
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# via_file(m) 2362.7778 2396.2277 2415.9207 2413.0772 2439.5752 2467.9457 5
# via_paste(m) 5287.8176 5305.6228 5622.1432 5666.0165 5919.3568 5931.9023 5
# via_pipes(m) 461.4782 464.5656 506.4157 509.5532 542.1091 554.3726 5
# via_dataframe(m) 507.4674 514.2550 553.1791 515.9132 518.0807 710.1794 5
# via_pipeless(m) 448.9529 470.1074 499.4392 470.6874 500.6027 606.8459 5
sessionInfo()
# R version 3.3.1 (2016-06-21)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# Running under: OS X 10.11.6 (El Capitan)
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
# other attached packages:
# [1] microbenchmark_1.4-2.1 readr_1.0.0 magrittr_1.5 tibble_1.2 nycflights13_0.2.0
# loaded via a namespace (and not attached):
# [1] colorspace_1.2-6 scales_0.4.1 plyr_1.8.4 assertthat_0.1 tools_3.3.1 gtable_0.2.0 Rcpp_0.12.9 ggplot2_2.1.0 grid_3.3.1 munsell_0.4.3
### END
> On Apr 6, 2017, at 3:34 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>
> Ulrik's solution gives you factors. To get them as characters, add as.is=TRUE:
>
>> m %>%
> + as_tibble() %>%
> + lapply(type.convert, as.is=TRUE) %>%
> + as_tibble()
> # A tibble: 4 × 5
> A B C D E
> <chr> <chr> <chr> <int> <dbl>
> 1 a e i 1 11.2
> 2 b f j 2 12.2
> 3 c g k 3 13.2
> 4 d h l 4 14.2
>
> Other possibilities:
>
>> mm <- lapply(data.frame(m, stringsAsFactors=FALSE), type.convert, as.is=TRUE)
>> as_tibble(mm)
> # Your solution simplified by converting to a data.frame
>
>> as_tibble(lapply(as_tibble(m), type.convert, as.is=TRUE))
> # Ulrik's solution but without the pipes. Shows why you need 2 as_tibbles()
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ben Tupper
> Sent: Thursday, April 6, 2017 11:42 AM
> To: Ulrik Stervbo <ulrik.stervbo at gmail.com>
> Cc: R-help Mailing List <r-help at r-project.org>
> Subject: Re: [R] readr to generate tibble from a character matrix
>
> Hi,
>
> Thanks for this solution! Very slick!
>
> I see what you mean about the two calls to as_tibble(). I suppose I could do the following, but I doubt it is a gain...
>
> mm <- lapply(colnames(m), function(nm, m) type.convert(m[,nm], as.is = TRUE), m=m)
> names(mm) <- colnames(m)
> as_tibble(mm)
>
> # # A tibble: 4 × 5
> # A B C D E
> # <chr> <chr> <chr> <int> <dbl>
> # 1 a e i 1 11.2
> # 2 b f j 2 12.2
> # 3 c g k 3 13.2
> # 4 d h l 4 14.2
>
> I'll benchmark these with writing to a temporary file and pasting together a string.
>
> Cheers and thanks,
> Ben
>
> On Apr 6, 2017, at 11:15 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
>>
>> Hi Ben,
>>
>> type.convert should do the trick:
>>
>> m %>%
>> as_tibble() %>%
>> lapply(type.convert) %>%
>> as_tibble()
>>
>> I am not too happy about to double 'as_tibble' but it get the job done.
>>
>> HTH
>> Ulrik
>>
>> On Thu, 6 Apr 2017 at 16:41 Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote:
>> Hello,
>>
>> I have a workflow yields a character matrix that I convert to a tibble. Here is a simple example.
>>
>> library(tibble)
>> library(readr)
>>
>> m <- matrix(c(letters[1:12], 1:4, (11:14 + 0.2)), ncol = 5)
>> colnames(m) <- LETTERS[1:5]
>>
>> x <- as_tibble(m)
>>
>> # # A tibble: 4 × 5
>> # A B C D E
>> # <chr> <chr> <chr> <chr> <chr>
>> # 1 a e i 1 11.2
>> # 2 b f j 2 12.2
>> # 3 c g k 3 13.2
>> # 4 d h l 4 14.2
>>
>> The workflow output columns can be a mix of a known set column outputs. Some of the columns really should be converted to non-character types before I proceed. Right now I explictly set the column classes with something like this...
>>
>> mode(x[['D']]) <- 'integer'
>> mode(x[['E']]) <- 'numeric'
>>
>> # # A tibble: 4 × 5
>> # A B C D E
>> # <chr> <chr> <chr> <int> <dbl>
>> # 1 a e i 1 11.2
>> # 2 b f j 2 12.2
>> # 3 c g k 3 13.2
>> # 4 d h l 4 14.2
>>
>>
>> I wonder if there is a way to use the read_* functions in the readr package to read the character matrix into a tibble directly which would leverage readr's excellent column class guessing. I can see in the vignette ( https://cran.r-project.org/web/packages/readr/vignettes/readr.html <https://cran.r-project.org/web/packages/readr/vignettes/readr.html> ) that I'm not too far off in thinking this could be done (step 1 tantalizingly says 'The flat file is parsed into a rectangular matrix of strings.')
>>
>> I know that I could either write the matrix to a file or paste it all into a character vector and then use read_* functions, but I confess I am looking for a straighter path by simply passing the matrix to a function like readr::read_matrix() or the like.
>>
>> Thanks!
>> Ben
>>
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org <http://www.bigelow.org/>
>>
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org
More information about the R-help
mailing list