[Rd] arithmetic with zero-column data.frames

Martin Maechler maechler at stat.math.ethz.ch
Mon Aug 14 14:44:14 CEST 2017


>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Wed, 9 Aug 2017 12:39:26 +0200 writes:

    > So as often there is more to it than you first think.
    > Let's consider this an RFC (for experienced long time R users) :

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Wed, 9 Aug 2017 10:45:56 +0200 writes:

>>>>> William Dunlap via R-devel <r-devel at r-project.org>
>>>>>     on Tue, 8 Aug 2017 11:59:45 -0700 writes:

    >>> Should arithmetic operations work on zero-column data.frames (returning a
    >>> zero-column data.frame with the same number of rows as the data.frame
    >>> argument(s))?   Currently we get:

    >>>> 1 + data.frame(row.names=c("A","B"))
    >>> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >>> = FALSE) :
    >>> row names supplied are of the wrong length
    >>>> data.frame(row.names=c("A","B")) * 2
    >>> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >>> = FALSE) :
    >>> row names supplied are of the wrong length
    >>>> data.frame(row.names=c("A","B")) / data.frame(row.names=c("A","B"))
    >>> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >>> = FALSE) :
    >>> row names supplied are of the wrong length

    >>> Bill Dunlap
    >>> TIBCO Software
    >>> wdunlap tibco.com

    >> Thank you, Bill.

    >> Yes, indeed, as we have the   Ops.data.frame  and
    >> Math.data.frame group methods  (about which I have not always
    >> been so happy,  but they are inheritance from S),
    >> and as the Math methods work too,  we should get this boundary
    >> case working as well for the Ops.

    > Hmm..  This time, I'd be glad for comments, notably from you, Bill:

    > In looking at this, I notice that "^" is treated
    > exceptionally, possibly not on purpose, i.e., accidentally. E.g.,
    > USArrests ^ 2    returns a matrix  where all other arithmetic
    > Ops give a data frame.

    > All non-arithmetic Ops do give a matrix [also not documentedly, AFAICS].
    > and currently "^"  is treated like them.

    > Note that Math.data.frame always returns a data frame (when it
    > does return), so we currently have this ugly inconsistency:

    >> str(USArrests ^ 0.5)
    > num [1:50, 1:4] 3.63 3.16 2.85 2.97 3 ...
    > - attr(*, "dimnames")=List of 2
    > ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
    > ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
    >> str(sqrt(USArrests))
    > 'data.frame':	50 obs. of  4 variables:
    > $ Murder  : num  3.63 3.16 2.85 2.97 3 ...
    > $ Assault : num  15.4 16.2 17.1 13.8 16.6 ...
    > $ UrbanPop: num  7.62 6.93 8.94 7.07 9.54 ...
    > $ Rape    : num  4.6 6.67 5.57 4.42 6.37 ...
    >> 

    > I propose to add "^" to the other arithmetic ops which return a
    > data frame.  So in the above,  '^ 0.5' would give the same [upto
    > lowest bit rounding error] as sqrt().

    > - -- - -- - --

    > A further inconsistency is that the Math methods directly refuse
    > to work on a data frame with non-numeric variables, whereas the
    > Ops methods just go along and give warnings and NA's:

    >> sqrt(CO2)
    > Error in Math.data.frame(CO2) : 
    > non-numeric variable in data frame: PlantTypeTreatment

    >> str( CO2 ^ 0.5 )
    > num [1:84, 1:5] NA NA NA NA NA NA NA NA NA NA ...
    > - attr(*, "dimnames")=List of 2
    > ..$ : chr [1:84] "1" "2" "3" "4" ...
    > ..$ : chr [1:5] "Plant" "Type" "Treatment" "conc" ...
    > Warning messages:
    > 1: In Ops.ordered(left, right) : '^' is not meaningful for ordered factors
    > 2: In Ops.factor(left, right) : ‘^’ not meaningful for factors
    > 3: In Ops.factor(left, right) : ‘^’ not meaningful for factors
    >> 

    > One "clean" radical solution here would be for the  Ops method
    > to also directly give an error as the Math one.
    > But that may be undesirable.
    > Assume people have data frame variables of classes where an Ops method is
    > defined for it.  Then  the corresponding "op" is applied
    > everywhere and the result maybe useful and as desired.

    > So, I'm much less sure what's desireable here.
    > Should we just document the behavior of this latter inconsistency?

as there was no feedback from anyone,
I have now committed -- to R-devel only, svn 73093 -- what I had proposed
above:

- arithmetic for 0-column data frames now works

- "Arith"metic giving data frames, also for '^'

- the other "Ops", i.e., "Compare" and "Logic" continue
  to return a logical matrix, and this is now documented.

Martin Maechler
ETH Zurich and R Core



More information about the R-devel mailing list