[R] vectorization question

Martin Maechler maechler at stat.math.ethz.ch
Fri Aug 15 10:44:31 CEST 2003


>>>>> "Tony" == Tony Plate <tplate at blackmesacapital.com>
>>>>>     on Thu, 14 Aug 2003 11:43:11 -0600 writes:

    Tony> From ?data.frame:
    >> Details:
    >> 
    >> A data frame is a list of variables of the same length with unique
    >> row names, given class `"data.frame"'.

    Tony> Your example constructs an object that does not
    Tony> conform to the definition of a data frame (the new
    Tony> column is not the same length as the old columns).
    Tony> Some data frame functions may work OK with such an
    Tony> object, but others will not.  For example, the print
    Tony> function for data.frame silently handles such an
    Tony> illegal data frame (which could be described as
    Tony> unfortunate.)  It would probably be far easier to
    Tony> construct a correct data frame in the first place than
    Tony> to try to find and fix functions that don't handle
    Tony> illegal data frames.  For adding a new column to a
    Tony> data frame, the expressions "x[,new.column.name] <-
    Tony> value" and "x[[new.column.name]] <- value" will
    Tony> replicate the value so that the new column is the same
    Tony> length as the existing ones, while the "$" operator in
    Tony> an assignment will not replicate the value.  (One
    Tony> could argue that this is a deficiency, but I think it
    Tony> has been that way for a long time, and the behavior is
    Tony> the same in the current version of S-plus.)

    >> x1 <- data.frame(a=1:3)
    >> x2 <- x1
    >> x3 <- x1
    >> x1$b <- 0
    >> x2[,"b"] <- 0
    >> x3[["b"]] <- 0
    >> sapply(x1, length)
    Tony> a b
    Tony> 3 1
    >> sapply(x2, length)
    Tony> a b
    Tony> 3 3
    >> sapply(x3, length)
    Tony> a b
    Tony> 3 3
    >> as.matrix(x2)
    Tony> a b
    Tony> 1 1 0
    Tony> 2 2 0
    Tony> 3 3 0
    >> as.matrix(x1)
    Tony> Error in as.matrix.data.frame(x1) : dim<- length of dims do not match the 
    Tony> length of object

Thank you, Tony.  This certainly was the most precise
explanation on this thread.

Everyone note however, that this has been improved (by Brian
Ripley) in the current R-devel {which should be come R 1.8 in October}.
There, also "$<-" assignment of data frames does check things
and in this case will do the same replication as the [,] or [[]]
assignments do.  
For back compatibility (with S-plus and earlier R versions), I'd
still recommend using bracket "[" rather than "$" assignment for
data frames.

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><




More information about the R-help mailing list