[Rd] Bug in "$<-.data.frame" yields corrupt data frame (PR#13724)

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu May 28 08:50:09 CEST 2009


> Would the above modification work to fix this problem?

Yes thank you, and I've incorporated it in R-patched and R-devel.

It does catch 3 packages, DescribeDisplay, rgcvpack and BioC:rHVDM.

On Wed, 27 May 2009, smckinney at bccrc.ca wrote:

> Full_Name: Steven McKinney
> Version: 2.9.0
> OS: Mac OS X 10.5.6
> Submission from: (NULL) (142.103.207.10)
>
>
>
> A corrupt data frame can be constructed as follows:
> foo <- matrix(1:12, nrow = 3)
> bar <- data.frame(foo)
> bar$NewCol <- foo[foo[, 1] == 4, 4]
> bar
> lapply(bar, length)
>
>
>
>
>> foo <- matrix(1:12, nrow = 3)
>> bar <- data.frame(foo)
>> bar$NewCol <- foo[foo[, 1] == 4, 4] bar
>   X1 X2 X3 X4 NewCol
> 1  1  4  7 10   <NA>
> 2  2  5  8 11   <NA>
> 3  3  6  9 12   <NA>
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs
>> lapply(bar, length)
> $X1
> [1] 3
>
> $X2
> [1] 3
>
> $X3
> [1] 3
>
> $X4
> [1] 3
>
> $NewCol
> [1] 0
>
>
> The data.frame method is
>
>> getAnywhere("$<-.data.frame" )
> A single object matching '$<-.data.frame' was found It was found in the
> following places
>  package:base
>  registered S3 method for $<- from namespace base
>  namespace:base
> with value
>
> function (x, i, value)
> {
>    cl <- oldClass(x)
>    class(x) <- NULL
>    nrows <- .row_names_info(x, 2L)
>    if (!is.null(value)) {
>        N <- NROW(value)
>        if (N > nrows)
>            stop(gettextf("replacement has %d rows, data has %d",
>                N, nrows), domain = NA)
>        if (N < nrows && N > 0L)
>            if (nrows%%N == 0L && length(dim(value)) <= 1L)
>                value <- rep(value, length.out = nrows)
>            else stop(gettextf("replacement has %d rows, data has %d",
>                N, nrows), domain = NA)
>        if (is.atomic(value))
>            names(value) <- NULL
>    }
>    x[[i]] <- value
>    class(x) <- cl
>    return(x)
> }<environment: namespace:base>
>>
>
>
> I placed a browser() command before return(x) and did some poking
> around.  The issue is that the example above creates an object with
> N < nrows but N == 0L, so either an else clause to check for this
> condition is needed, or, it appears to me, the N > 0L part of the
> conditional clause needs to be moved to the next if clause.
>
> I modified the rows
>          if (N < nrows && N > 0L)
>            if (nrows%%N == 0L && length(dim(value)) <= 1L)
> to read
>           if (N < nrows)
>            if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L)
>
> as in
>
> "$<-.data.frame" <-
> function (x, i, value)
> {
>    cl <- oldClass(x)
>    class(x) <- NULL
>    nrows <- .row_names_info(x, 2L)
>    if (!is.null(value)) {
>        N <- NROW(value)
>        if (N > nrows)
>            stop(gettextf("replacement has %d rows, data has %d",
>                N, nrows), domain = NA)
>        if (N < nrows)
>            if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L)
>                value <- rep(value, length.out = nrows)
>            else stop(gettextf("replacement has %d rows, data has %d",
>                N, nrows), domain = NA)
>        if (is.atomic(value))
>            names(value) <- NULL
>    }
>    x[[i]] <- value
>    class(x) <- cl
>    return(x)
> }
>
> Now it detects the problem above:
>
>> foo <- matrix(1:12, nrow = 3)
>> bar <- data.frame(foo)
>> bar$NewCol <- foo[foo[, 1] == 4, 4]
> Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) :
>  replacement has 0 rows, data has 3
>
> It doesn't appear to stumble on weird data frames (these from the
> ?data.frame help page)
>
>
>> L3 <- LETTERS[1:3]
>> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,
> replace=TRUE)))
>> (d0  <- d[, FALSE]) # NULL data frame with 10 rows
>
>> (d.0 <- d[FALSE, ]) # <0 rows> data frame  (3 cols)
>
>> (d00 <- d0[FALSE,])  # NULL data frame with 0 rows
>
>> d0$NewCol <- foo[foo[, 1] == 4, 4]
> Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) :
>  replacement has 0 rows, data has 10
>
> ### Catches this problem above alright.
>
>> d.0$NewCol <- foo[foo[, 1] == 4, 4]
>> d.0
> [1] x      y      fac    NewCol
> <0 rows> (or 0-length row.names)
>
> ### Lets the above one through alright.
>
>> d00$NewCol <- foo[foo[, 1] == 4, 4]
>>
>> d00
> [1] NewCol
> <0 rows> (or 0-length row.names)
> ### Lets the above one through alright.
>
>
> Would the above modification work to fix this problem?
>
>
>
>
>
>
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> powerpc-apple-darwin8.11.1
>
> locale:
> en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] nlme_3.1-90
>
> loaded via a namespace (and not attached):
> [1] grid_2.9.0      lattice_0.17-22 tools_2.9.0
>
>
> Also occurs on Windows box with R 2.8.1
>
>
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program British Columbia Cancer
> Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list