[R] Efficient way to subset rows in R for dataset with 10^7 columns

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Apr 14 03:08:27 CEST 2018


You have 10^7 columns? That process is bound to be slow.

On April 13, 2018 5:31:32 PM PDT, Jack Arnestad <jackarnestad using gmail.com> wrote:
>I have a data.table with dimensions 100 by 10^7.
>
>When I do
>
>    trainIndex <-
>      caret::createDataPartition(
>        df$status,
>        p = .9,
>        list = FALSE,
>        times = 1
>      )
>    outerTrain <- df[trainIndex]
>    outerTest  <- df[-trainIndex]
>
>Subsetting the rows of df takes over 20 minutes.
>
>What is the best way to efficiently subset this?
>
>Thanks!
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.




More information about the R-help mailing list