[R] Corrupt data frame construction - bug?

Duncan Murdoch murdoch at stats.uwo.ca
Thu Apr 30 02:09:51 CEST 2009


On 29/04/2009 6:41 PM, Steven McKinney wrote:
> Hi useRs,
> 
> A recent coding infelicity along these lines
> yielded a corrupt data frame.
> 
> foo <- matrix(1:12, nrow = 3)
> bar <- data.frame(foo)
> bar$NewCol <- foo[foo[, 1] == 4, 4]
> bar
> lapply(bar, length)
> 
> 
> 
> 
>> foo <- matrix(1:12, nrow = 3)
>> bar <- data.frame(foo)
>> bar$NewCol <- foo[foo[, 1] == 4, 4]
>> bar
>   X1 X2 X3 X4 NewCol
> 1  1  4  7 10   <NA>
> 2  2  5  8 11   <NA>
> 3  3  6  9 12   <NA>
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs
>> lapply(bar, length)
> $X1
> [1] 3
> 
> $X2
> [1] 3
> 
> $X3
> [1] 3
> 
> $X4
> [1] 3
> 
> $NewCol
> [1] 0
> 
> 
> Is this a bug in the data.frame machinery?
> If an attempt is made to add a new column
> to a data frame, and the new object does
> not have length = number of rows of data frame,
> or cannot be made to have such length via recycling,
> shouldn't an error be thrown?
> 
> Instead in this example I end up with a
> "corrupt data frame" having one zero-length column.
> 
> 
> Should this be reported as a bug, or did I misinterpret
> the documentation?

I don't think "$" uses any data.frame machinery.  You are working at a 
lower level.

If you had added the new column using

bar <- data.frame(bar, NewCol=foo[foo[, 1] == 4, 4])

you would have seen the error:

Error in data.frame(bar, NewCol = foo[foo[, 1] == 4, 4]) :
   arguments imply differing number of rows: 3, 0

But since you treated it as a list, it let you go ahead and create 
something that was labelled as a data.frame but wasn't.  This is one of 
the reasons some people prefer S4 methods:  it's easier to protect 
against people who mislabel things.

Duncan Murdoch

> 
> 
> 
> 
>> sessionInfo()
> R version 2.9.0 (2009-04-17) 
> powerpc-apple-darwin8.11.1 
> 
> locale:
> en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] nlme_3.1-90
> 
> loaded via a namespace (and not attached):
> [1] grid_2.9.0      lattice_0.17-22 tools_2.9.0    
> 
> 
> Also occurs on Windows box with R 2.8.1
> 
> 
> 
> Steven McKinney
> 
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
> 
> email: smckinney +at+ bccrc +dot+ ca
> 
> tel: 604-675-8000 x7561
> 
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C. 
> V5Z 1L3
> Canada
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list