[Rd] subscripting a data.frame (without changing row order) changes internal row.names
Dr Gregory Jefferis
jefferis at mrc-lmb.cam.ac.uk
Mon Nov 10 19:35:18 CET 2014
Dear R-devel,
Can anyone help me to understand this? It seems that subscripting the
rows of a data.frame without actually changing their order, somehow
changes an internal representation of row.names that is revealed by e.g.
dput/dump/serialize
I have read the docs and inspected the (R) code for data.frame,
rownames, row.names and dput without enlightenment.
df=data.frame(a=1:10, b=1)
dput(df)
df2=df[1:nrow(df), ]
# R thinks they are equal (so do I!)
all.equal(df, df2)
dput(df2)
Looking at the output of the dputs
> dput(df)
structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names =
c("a",
"b"), row.names = c(NA, -10L), class = "data.frame")
> dput(df2)
structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names =
c("a",
"b"), row.names = c(NA, 10L), class = "data.frame")
we have row.names = c(NA, -10L) in the first case and row.names = c(NA,
10L) in the second, so somehow these objects have a different
representation
Can anyone explain why? This has come up because
> library(digest)
> digest(df)==digest(df2)
[1] FALSE
digest uses serialize under the hood, but serialize, dput and dump all
show the same effect (I've pasted an example below using dump, md5sum
from base R).
Many thanks for any enlightenment! More generally is there any way to
calculate a digest of a data.frame that could get round this issue or is
that not possible?
Best wishes,
Greg.
A digest using base R:
library(tools)
td=tempfile()
dir.create(td)
tempfiles=file.path(td,c("df", "df2"))
dump("df",tempfiles[1])
dump("df2",tempfiles[2])
md5sum(tempfiles)
# different md5sum
> sessionInfo() # for my laptop but also observed on R 3.1.2
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
base
other attached packages:
[1] nat_1.5.14 nat.utils_0.4.2 digest_0.6.4 Rvcg_0.9
devtools_1.6.1 igraph_0.7.1
[7] testthat_0.9.1 rgl_0.93.1098
loaded via a namespace (and not attached):
[1] codetools_0.2-9 filehash_2.2-2 nabor_0.4.3
parallel_3.1.1 plyr_1.8.1
[6] Rcpp_0.11.3 rstudio_0.98.1062 rstudioapi_0.1 XML_3.98-1.1
yaml_2.1.13
--
Gregory Jefferis, PhD
Division of Neurobiology
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge, CB2 OQH, UK
http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis
http://jefferislab.org
http://flybrain.stanford.edu
More information about the R-devel
mailing list