[Rd] Bug in "$<-.data.frame" yields corrupt data frame (PR#13724)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu May 28 08:50:09 CEST 2009
> Would the above modification work to fix this problem?
Yes thank you, and I've incorporated it in R-patched and R-devel.
It does catch 3 packages, DescribeDisplay, rgcvpack and BioC:rHVDM.
On Wed, 27 May 2009, smckinney at bccrc.ca wrote:
> Full_Name: Steven McKinney
> Version: 2.9.0
> OS: Mac OS X 10.5.6
> Submission from: (NULL) (142.103.207.10)
>
>
>
> A corrupt data frame can be constructed as follows:
> foo <- matrix(1:12, nrow = 3)
> bar <- data.frame(foo)
> bar$NewCol <- foo[foo[, 1] == 4, 4]
> bar
> lapply(bar, length)
>
>
>
>
>> foo <- matrix(1:12, nrow = 3)
>> bar <- data.frame(foo)
>> bar$NewCol <- foo[foo[, 1] == 4, 4] bar
> X1 X2 X3 X4 NewCol
> 1 1 4 7 10 <NA>
> 2 2 5 8 11 <NA>
> 3 3 6 9 12 <NA>
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
> corrupt data frame: columns will be truncated or padded with NAs
>> lapply(bar, length)
> $X1
> [1] 3
>
> $X2
> [1] 3
>
> $X3
> [1] 3
>
> $X4
> [1] 3
>
> $NewCol
> [1] 0
>
>
> The data.frame method is
>
>> getAnywhere("$<-.data.frame" )
> A single object matching '$<-.data.frame' was found It was found in the
> following places
> package:base
> registered S3 method for $<- from namespace base
> namespace:base
> with value
>
> function (x, i, value)
> {
> cl <- oldClass(x)
> class(x) <- NULL
> nrows <- .row_names_info(x, 2L)
> if (!is.null(value)) {
> N <- NROW(value)
> if (N > nrows)
> stop(gettextf("replacement has %d rows, data has %d",
> N, nrows), domain = NA)
> if (N < nrows && N > 0L)
> if (nrows%%N == 0L && length(dim(value)) <= 1L)
> value <- rep(value, length.out = nrows)
> else stop(gettextf("replacement has %d rows, data has %d",
> N, nrows), domain = NA)
> if (is.atomic(value))
> names(value) <- NULL
> }
> x[[i]] <- value
> class(x) <- cl
> return(x)
> }<environment: namespace:base>
>>
>
>
> I placed a browser() command before return(x) and did some poking
> around. The issue is that the example above creates an object with
> N < nrows but N == 0L, so either an else clause to check for this
> condition is needed, or, it appears to me, the N > 0L part of the
> conditional clause needs to be moved to the next if clause.
>
> I modified the rows
> if (N < nrows && N > 0L)
> if (nrows%%N == 0L && length(dim(value)) <= 1L)
> to read
> if (N < nrows)
> if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L)
>
> as in
>
> "$<-.data.frame" <-
> function (x, i, value)
> {
> cl <- oldClass(x)
> class(x) <- NULL
> nrows <- .row_names_info(x, 2L)
> if (!is.null(value)) {
> N <- NROW(value)
> if (N > nrows)
> stop(gettextf("replacement has %d rows, data has %d",
> N, nrows), domain = NA)
> if (N < nrows)
> if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L)
> value <- rep(value, length.out = nrows)
> else stop(gettextf("replacement has %d rows, data has %d",
> N, nrows), domain = NA)
> if (is.atomic(value))
> names(value) <- NULL
> }
> x[[i]] <- value
> class(x) <- cl
> return(x)
> }
>
> Now it detects the problem above:
>
>> foo <- matrix(1:12, nrow = 3)
>> bar <- data.frame(foo)
>> bar$NewCol <- foo[foo[, 1] == 4, 4]
> Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) :
> replacement has 0 rows, data has 3
>
> It doesn't appear to stumble on weird data frames (these from the
> ?data.frame help page)
>
>
>> L3 <- LETTERS[1:3]
>> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,
> replace=TRUE)))
>> (d0 <- d[, FALSE]) # NULL data frame with 10 rows
>
>> (d.0 <- d[FALSE, ]) # <0 rows> data frame (3 cols)
>
>> (d00 <- d0[FALSE,]) # NULL data frame with 0 rows
>
>> d0$NewCol <- foo[foo[, 1] == 4, 4]
> Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) :
> replacement has 0 rows, data has 10
>
> ### Catches this problem above alright.
>
>> d.0$NewCol <- foo[foo[, 1] == 4, 4]
>> d.0
> [1] x y fac NewCol
> <0 rows> (or 0-length row.names)
>
> ### Lets the above one through alright.
>
>> d00$NewCol <- foo[foo[, 1] == 4, 4]
>>
>> d00
> [1] NewCol
> <0 rows> (or 0-length row.names)
> ### Lets the above one through alright.
>
>
> Would the above modification work to fix this problem?
>
>
>
>
>
>
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> powerpc-apple-darwin8.11.1
>
> locale:
> en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] nlme_3.1-90
>
> loaded via a namespace (and not attached):
> [1] grid_2.9.0 lattice_0.17-22 tools_2.9.0
>
>
> Also occurs on Windows box with R 2.8.1
>
>
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program British Columbia Cancer
> Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list