[R] ncol() vs. length() on data.frames

Ivan Calandra c@|@ndr@ @end|ng |rom rgzm@de
Mon Apr 6 08:48:23 CEST 2020


Thank you Greg for the insights!

I agree with you that the decrease in speed is not worth the decrease in
readability, and I'll change my length() calls to ncol().

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 03/04/2020 17:45, Greg Snow wrote:
> As others have pointed out, ncol calls the length function, so you are
> pretty safe in terms of output of getting the same result when applied
> to the results of functions like read.csv (there will be a big
> difference if you ever apply those functions to a matrix or some other
> data structures).
>
> One thing that I have not seen yet is a comparison on timing, so here goes:
>
>> library(microbenchmark)
>> microbenchmark(
> + length = length(iris),
> + ncol = ncol(iris)
> + )
> Unit: nanoseconds
>    expr  min   lq mean median   uq   max neval
>  length  700  750  869    800  800  7400   100
>    ncol 2400 2500 2981   2600 2700 31900   100
>
> So ncol takes about 3 times as long to run as length on the iris data
> frame (5 columns), you can rerun the above code with data frames more
> the size that you will be using to see if that makes any difference.
> But also notice that the units are nanoseconds, so the median time for
> ncol to run is less than the time it takes light to travel a kilometer
> in a vacuum, or about the time it takes light to go 1/3 of a mile
> through a fiber optic cable (en.wikipedia.org/wiki/Microsecond).  If
> this is used as part of a simulation or other repeated procedure and
> it is done one million times then you will add about 2 seconds to the
> overall run.  If this is just part of code where length/ncol will be
> called fewer than 10 times then nobody is going to notice.
>
> So the trade-off of moving from length to ncol is a slight decrease in
> speed for an increase of readability.  I think that I would go with
> the readability myself.
>
> On Tue, Mar 31, 2020 at 8:11 AM Ivan Calandra <calandra using rgzm.de> wrote:
>> Thanks Ivan for the answer.
>>
>> So it confirms my first thought that these two functions are equivalent
>> when applied to a "simple" data.frame.
>>
>> The reason I was asking is because I have gotten used to use length() in
>> my scripts. It works perfectly and I understand it easily. But to be
>> honest, ncol() is more intuitive to most users (especially the novice)
>> so I was thinking about switching to using this function instead (all my
>> data.frames are created from read.csv() or similar functions so there
>> should not be any issue). But before doing that, I want to be sure that
>> it is not going to create unexpected results.
>>
>> Thank you,
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> On 31/03/2020 16:00, Ivan Krylov wrote:
>>> On Tue, 31 Mar 2020 14:47:54 +0200
>>> Ivan Calandra <calandra using rgzm.de> wrote:
>>>
>>>> On a simple data.frame (i.e. each element is a vector), ncol() and
>>>> length() will give the same result.
>>>> Are they just equivalent on such objects, or are they differences in
>>>> some cases?
>>> I am not aware of any exceptions to ncol(dataframe)==length(dataframe)
>>> (in fact, ncol(x) is dim(x)[2L] and ?dim says that dim(dataframe)
>>> returns c(length(attr(dataframe, 'row.names')), length(dataframe))), but
>>> watch out for AsIs columns which can have columns of their own:
>>>
>>> x <- data.frame(I(volcano))
>>> dim(x)
>>> # [1] 87  1
>>> length(x)
>>> # [1] 1
>>> dim(x[,1])
>>> # [1] 87 61
>>>
>>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list