[Rd] Changing the generic of as.data.frame

Bill Dunlap bill at insightful.com
Mon May 22 19:16:22 CEST 2006


On Mon, 22 May 2006, Prof Brian Ripley wrote:

> On Mon, 22 May 2006, Bill Dunlap wrote:
>
> > On Mon, 22 May 2006, Prof Brian Ripley wrote:
> >
> >> The other motivation was to allow the option to not convert character
> >> vectors to factors, which needed an additional argument to
> >> as.data.frame.character.  So data.frame now has an argument 'charToFactor'
> >> controlled by a global option (which also controls the default of as.is in
> >> read.table).  More experience will be needed as to whether it is safe to
> >> work with the global option set to FALSE, so that aspect should be
> >> regarded as experimental until 2.4.0 is released or it is withdrawn.
> >
> > Splus's data.frame() and as.data.frame() have had the 'stringsAsFactors'
> > argument to data.frame and as.data.frame since version 6.0 (2001).  Their
> > default values come from options("stringsAsFactors").  read.table() and
> > a few other data.frame-oriented functions have the same argument.
> > It looks like stringsAsFactors has the same functionality as your
> > new charToFactor.  Would it be feasible to change its name to stringsAsFactors?
>
> It would, but then I think we would want to ensure it did precisely the
> same thing.  If there a description of what exactly
> options("stringsAsFactors") affects?  (?options suggests it is data.frame,
> read.table and importData, and nothing else).

I just noticed that data.frame accepts a vector of logicals for
stringsAsFactors: one element per ... argument.  This is not in
the help file.

   Splus> data.frame
   function(..., row.names = NULL, check.rows = F, check.names = T, na.strings =
           "NA", dup.row.names = F, stringsAsFactors = default.stringsAsFactors(
           ))
   {
           dots <- match.call(expand.dots = F)$...
           n <- length(dots) - 1
           ...
           stringsAsFactors <- rep(stringsAsFactors, len = n)
           for(i in seq(length = n)) {
                   xi <- data.frameAux(eval(as.name(paste("..", i, sep = ""))),
                           na.strings = na.strings, stringsAsFactors =
                           stringsAsFactors[i])


Splus's importData(), which outputs a data.frame from a various other
file formats or database connections, also uses stringsAsFactors.

It also affects as.data.frame() and the character method for data.frameAux()
(which is not expected to be called directly -- it is a support function for
data.frame() and as.data.frame()).

In the bigdata library bdFrame() uses it in the same way that data.frame()
does.

There may be a few stray functions that pass their ... arguments
to data.frame, but I cannot think of any now.

Its default value should always be default.stringsAsFactors(),
but I see the bdFrame() uses just FALSE.  default.stringsAsFactors()
looks at options("stringsAsFactor") and maps NULL and TRUE to TRUE.
    Splus> default.stringsAsFactors
    function()
    {
            val <- .Options$stringsAsFactors
            if(is.null(val))
                    val <- T
            if(!is.logical(val) || is.na(val) || length(val) != 1)
                    stop("options('stringsAsFactors') not set to T or F")
            val
    }

I believe that Terry Therneau has been using Splus with
options(stringsAsFactors=FALSE) for quite a while and hasn't reported
any problems.

>
> >
> > Splus> help(data.frame)
> >                     Construct a Data Frame Object
> >
> > USAGE:
> >
> > data.frame(..., row.names, check.rows=F, check.names=T,
> >           na.strings="NA", dup.row.names=F, stringsAsFactors=<<see below>>)
> > data.frameAux(x, ...)
> > as.data.frame(x, row.names=NULL,  stringsAsFactors=<<see below>>, ...)
> > is.data.frame(x)
> > ...
> >   stringsAsFactors
> >          a logical flag; if TRUE then convert character arguments
> >          to factors whose levels are the unique strings in the
> >          argument. This may save time and space if there a many
> >          repeated values in the strings and may make the
> >          statistical modelling functions easier to use. The
> >          default is TRUE, unless one sets options(stringsAsFactors=FALSE).
> > ...
> >
> >
> > ----------------------------------------------------------------------------
> > Bill Dunlap
> > Insightful Corporation
> > bill at insightful dot com
> > 360-428-8146
> >
> > "All statements in this message represent the opinions of the author and do
> > not necessarily reflect Insightful Corporation policy or position."
> >
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."



More information about the R-devel mailing list