[R] Is this kind of removing of elements from data.frame (in)efficient?

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Sep 13 16:43:42 CEST 2016


Your example is not reproducible [1], so the apparent error in it is distracting... perhaps you meant

kidmomhs <- kidmomhs[kidmomhs$kid_score != min(kidmomhs$kid_score),]

yes, this creates a copy, and because the object name is re-used on the left side the original memory gets returned to the memory pool the next time garbage collection occurs. 

While this may seem inefficient, this (functional) programming model is much less likely to lead to programming errors than in-place approaches. My advice is to refrain from premature optimization and get the algorithm right, then later you could rewrite using something like the data.table package if the standard functional model is too slow for a particular application.

In addition, I tend to find that not re-using the object name (not releasing the memory) aids debugging and traceability, which if you are looking to make reproducible research is often an advantage.

[1] see e.g. http://adv-r.had.co.nz/Reproducibility.html
-- 
Sent from my phone. Please excuse my brevity.

On September 13, 2016 12:37:18 AM PDT, mviljamaa <mviljamaa at kapsi.fi> wrote:
>So I'm a beginner in R and I was testing the removal of elements from a
>
>data.frame.
>
>The way I remove the element(s) with the minimum value in kid_score 
>variable is to do:
>
>kidmomhs <- data[kidmomhs$kid_score != min(kidmomhs$kid_score),]
>
>So now kidmomhs is the same data, but without the row(s) with the 
>minimum value of kid_score.
>
>Judging by the syntax this looks as if R might be creating a copy of
>the 
>data array, just without the rows that were removed.
>
>The question however is, is this the most efficient way to remove 
>elements from data structures in R? And is the above inefficient? Does 
>the above create copies of almost the entire data structure?
>
>In other programming languages I've become accustomed to doing removal 
>of elements by changing them to NULL and then e.g. reordering the data 
>structure. Rather than having to take copies of almost the entire data 
>structure.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list