[Rd] [R] "[.data.frame" and lapply

Romain Francois romain.francois at dbmail.com
Thu Mar 26 11:42:14 CET 2009


[moving this from R-help to R-devel]

Hi,

Right, so when you call `[`, the dispatch is made internally :

 > d <- data.frame( x = 1:5, y = rnorm(5), z = rnorm(5) )
 > trace( `[.data.frame` )
 > d[ , 1:2]   # ensuring the 1:2 is passed to j and the i is passed as 
missing
Tracing `[.data.frame`(d, , 1:2) on entry
  x           y
1 1  0.98946922
2 2  0.05323895
3 3 -0.21803664
4 4 -0.47607043
5 5  1.23366151

 > d[ 1:2] # only on argument, so it goes in i
Tracing `[.data.frame`(d, 1:2) on entry
  x           y
1 1  0.98946922
2 2  0.05323895
3 3 -0.21803664
4 4 -0.47607043
5 5  1.23366151

But that does not explain why this is hapening:

 > d[ i = 1:2]
Tracing `[.data.frame`(d, i = 1:2) on entry
  x           y
1 1  0.98946922
2 2  0.05323895
3 3 -0.21803664
4 4 -0.47607043
5 5  1.23366151

 > d[ j = 1:2]
Tracing `[.data.frame`(d, j = 1:2) on entry
  x           y          z
1 1  0.98946922 -0.5233134
2 2  0.05323895  1.3646683
3 3 -0.21803664 -0.4998344
4 4 -0.47607043 -1.8849618
5 5  1.23366151  0.6723562

Arguments are dispatched to `[.data.frame` with their names, and 
`[.data.frame` gets confused. I'm not suggesting allowing named 
arguments because it already works, what does not work is how 
`[.data.frame` treats them, and that needs to be changed, this is a bug.

Romain

 > version
               _
platform       i686-pc-linux-gnu
arch           i686
os             linux-gnu
system         i686, linux-gnu
status         Under development (unstable)
major          2
minor          9.0
year           2009
month          03
day            09
svn rev        48093
language       R
version.string R version 2.9.0 Under development (unstable) (2009-03-09 
r48093)




baptiste auguie wrote:
> Hi,
>
> I got an off-line clarification from Martin Morgan which makes me 
> believe it's not a bug (admittedly, I was close to suggesting it before).
>
> Basically, "[" is a .Primitive, for which the help page says,
>
>
>> The advantage of |.Primitive| over |.Internal 
>> <file:///Library/Frameworks/R.framework/Resources/library/base/html/Internal.html>| functions 
>> is the potential efficiency of argument passing. However, this is 
>> done by ignoring argument names and using positional matching of 
>> arguments (unless arranged differently for specific primitives such 
>> as |rep 
>> <file:///Library/Frameworks/R.framework/Resources/library/base/html/rep.html>|), 
>> so this is discouraged for functions of more than one argument.
>
> This explains why in my tests the argument names i and j were 
> completely ignored and only the number and order of arguments changed 
> the result. 
>
> I've learnt my lesson here, but I wonder what could be done to make 
> this discovery easier for others:
>
> - add a note in the documentation of each .Primitive function (at 
> least a link to ?.Primitive)
>
> - add such an example in lapply (all examples are for named arguments)
>
> - echo a warning if trying to pass named arguments to a .Primitive
>
> - allow for named arguments as you suggest
>
> I'm not sure the last two would be possible without some cost in 
> efficiency.
>
>
> Many thanks,
>
> baptiste
>
>
>
>
> On 26 Mar 2009, at 07:46, Romain Francois wrote:
>
>>
>> Hi,
>>
>> This is a bug I think. [.data.frame treats its arguments differently
>> depending on the number of arguments.
>>
>>> d <- data.frame(x = rnorm(5), y = rnorm(5), z = rnorm(5) )
>>> d[, 1:2]
>>             x           y
>> 1   0.45141341  0.03943654
>> 2  -0.87954548  1.83690210
>> 3  -0.91083710  0.22758584
>> 4   0.06924279  1.26799176
>> 5  -0.20477052 -0.25873225
>>> base:::`[.data.frame`( d, j=1:2)
>>             x           y          z
>> 1   0.45141341  0.03943654 -0.8971957
>> 2  -0.87954548  1.83690210  0.9083281
>> 3  -0.91083710  0.22758584 -0.3104906
>> 4   0.06924279  1.26799176  1.2625699
>> 5  -0.20477052 -0.25873225  0.5228342
>> but also:
>>> d[ j=1:2]
>>            x           y          z
>> 1  0.45141341  0.03943654 -0.8971957
>> 2 -0.87954548  1.83690210  0.9083281
>> 3 -0.91083710  0.22758584 -0.3104906
>> 4  0.06924279  1.26799176  1.2625699
>> 5 -0.20477052 -0.25873225  0.5228342
>>
>> `[.data.frame` only is called with two arguments in the second case, so
>> the following condition is true:
>>
>> if(Narg < 3L) {  # list-like indexing or matrix indexing
>>
>> And then, the function assumes the argument it has been passed is i, and
>> eventually calls NextMethod("[") which I think calls
>> `[.listof`(x,i,...), since i is missing in `[.data.frame` it is not
>> passed to `[.listof`, so you have something equivalent to as.list(d)[].
>>
>> I think we can replace the condition with this one:
>>
>> if(Narg < 3L && !has.j) {  # list-like indexing or matrix indexing
>>
>> or this:
>>
>> if(Narg < 3L) {  # list-like indexing or matrix indexing
>>        if(has.j) i <- j
>>
>>> `[.data.frame`(d, j=1:2)
>>            x           y
>> 1  0.45141341  0.03943654
>> 2 -0.87954548  1.83690210
>> 3 -0.91083710  0.22758584
>> 4  0.06924279  1.26799176
>> 5 -0.20477052 -0.25873225
>>
>> However, we would still have this, which is expected (same as d[1:2] ):
>>
>>> `[.data.frame`(d, i=1:2)
>>            x           y
>> 1  0.45141341  0.03943654
>> 2 -0.87954548  1.83690210
>> 3 -0.91083710  0.22758584
>> 4  0.06924279  1.26799176
>> 5 -0.20477052 -0.25873225
>>
>> Romain
>>
>> baptiste auguie wrote:
>>> Dear all,
>>>
>>>
>>> Trying to extract a few rows for each element of a list of
>>> data.frames, I'm puzzled by the following behaviour,
>>>
>>>
>>>> d <- lapply(1:4,  function(i) data.frame(x=rnorm(5), y=rnorm(5)))
>>>> str(d)
>>>>
>>>> lapply(d, "[", i= c(1)) # fine,  this extracts the first columns
>>>> lapply(d, "[", j= c(1, 3)) # doesn't do nothing ?!
>>>>
>>>> library(plyr)
>>>>
>>>> llply(d, "[", j= c(1, 3)) # same
>>>
>>>
>>> Am i misinterpreting the meaning of "j", which I thought was an
>>> argument of the method "[.data.frame"?
>>>
>>>
>>>> args(`[.data.frame`)
>>>> function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
>>>>   1)
>>>>
>>>
>>> Many thanks,
>>>
>>> baptiste
>>>
>>> _____________________________
>>>
>>> Baptiste Auguié
>>>
>>> School of Physics
>>> University of Exeter
>>> Stocker Road,
>>> Exeter, Devon,
>>> EX4 4QL, UK
>>>
>>> Phone: +44 1392 264187
>>>
>>> http://newton.ex.ac.uk/research/emag
>>>
>>> ______________________________________________
>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>> -- 
>> Romain Francois
>> Independent R Consultant
>> +33(0) 6 28 91 30 30
>> http://romainfrancois.blog.free.fr
>>
>>
>
> _____________________________
>
> Baptiste Auguié
>
> School of Physics
> University of Exeter
> Stocker Road,
> Exeter, Devon,
> EX4 4QL, UK
>
> Phone: +44 1392 264187
>
> http://newton.ex.ac.uk/research/emag
> ______________________________
>


-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr



More information about the R-devel mailing list