[Rd] duplicated.data.frame() is broken on data frames containing \r

Hervé Pagès hpages at fhcrc.org
Mon Jul 29 20:52:36 CEST 2013


Hi,

The trick used by duplicated.data.frame() is to transform the supplied
data.frame into a character vector by pasting together the columns using
"\r" as separator. But no precautions are taken to deal with "\r" in
the supplied data.frame. As a consequence it's easy to imagine
situations where duplicated.data.frame() returns an incorrect answer:

   > df <- data.frame(a=c("AA", "AA\r"), b=c("\rBBB", "BBB"))
   > df
        a     b
   1   AA \rBBB
   2 AA\r   BBB
   > duplicated(df)
   [1] FALSE  TRUE

Cheers,
H.

 > sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list