[Rd] Changing the generic of as.data.frame
Bill Dunlap
bill at insightful.com
Mon May 22 19:16:22 CEST 2006
On Mon, 22 May 2006, Prof Brian Ripley wrote:
> On Mon, 22 May 2006, Bill Dunlap wrote:
>
> > On Mon, 22 May 2006, Prof Brian Ripley wrote:
> >
> >> The other motivation was to allow the option to not convert character
> >> vectors to factors, which needed an additional argument to
> >> as.data.frame.character. So data.frame now has an argument 'charToFactor'
> >> controlled by a global option (which also controls the default of as.is in
> >> read.table). More experience will be needed as to whether it is safe to
> >> work with the global option set to FALSE, so that aspect should be
> >> regarded as experimental until 2.4.0 is released or it is withdrawn.
> >
> > Splus's data.frame() and as.data.frame() have had the 'stringsAsFactors'
> > argument to data.frame and as.data.frame since version 6.0 (2001). Their
> > default values come from options("stringsAsFactors"). read.table() and
> > a few other data.frame-oriented functions have the same argument.
> > It looks like stringsAsFactors has the same functionality as your
> > new charToFactor. Would it be feasible to change its name to stringsAsFactors?
>
> It would, but then I think we would want to ensure it did precisely the
> same thing. If there a description of what exactly
> options("stringsAsFactors") affects? (?options suggests it is data.frame,
> read.table and importData, and nothing else).
I just noticed that data.frame accepts a vector of logicals for
stringsAsFactors: one element per ... argument. This is not in
the help file.
Splus> data.frame
function(..., row.names = NULL, check.rows = F, check.names = T, na.strings =
"NA", dup.row.names = F, stringsAsFactors = default.stringsAsFactors(
))
{
dots <- match.call(expand.dots = F)$...
n <- length(dots) - 1
...
stringsAsFactors <- rep(stringsAsFactors, len = n)
for(i in seq(length = n)) {
xi <- data.frameAux(eval(as.name(paste("..", i, sep = ""))),
na.strings = na.strings, stringsAsFactors =
stringsAsFactors[i])
Splus's importData(), which outputs a data.frame from a various other
file formats or database connections, also uses stringsAsFactors.
It also affects as.data.frame() and the character method for data.frameAux()
(which is not expected to be called directly -- it is a support function for
data.frame() and as.data.frame()).
In the bigdata library bdFrame() uses it in the same way that data.frame()
does.
There may be a few stray functions that pass their ... arguments
to data.frame, but I cannot think of any now.
Its default value should always be default.stringsAsFactors(),
but I see the bdFrame() uses just FALSE. default.stringsAsFactors()
looks at options("stringsAsFactor") and maps NULL and TRUE to TRUE.
Splus> default.stringsAsFactors
function()
{
val <- .Options$stringsAsFactors
if(is.null(val))
val <- T
if(!is.logical(val) || is.na(val) || length(val) != 1)
stop("options('stringsAsFactors') not set to T or F")
val
}
I believe that Terry Therneau has been using Splus with
options(stringsAsFactors=FALSE) for quite a while and hasn't reported
any problems.
>
> >
> > Splus> help(data.frame)
> > Construct a Data Frame Object
> >
> > USAGE:
> >
> > data.frame(..., row.names, check.rows=F, check.names=T,
> > na.strings="NA", dup.row.names=F, stringsAsFactors=<<see below>>)
> > data.frameAux(x, ...)
> > as.data.frame(x, row.names=NULL, stringsAsFactors=<<see below>>, ...)
> > is.data.frame(x)
> > ...
> > stringsAsFactors
> > a logical flag; if TRUE then convert character arguments
> > to factors whose levels are the unique strings in the
> > argument. This may save time and space if there a many
> > repeated values in the strings and may make the
> > statistical modelling functions easier to use. The
> > default is TRUE, unless one sets options(stringsAsFactors=FALSE).
> > ...
> >
> >
> > ----------------------------------------------------------------------------
> > Bill Dunlap
> > Insightful Corporation
> > bill at insightful dot com
> > 360-428-8146
> >
> > "All statements in this message represent the opinions of the author and do
> > not necessarily reflect Insightful Corporation policy or position."
> >
> >
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146
"All statements in this message represent the opinions of the author and do
not necessarily reflect Insightful Corporation policy or position."
More information about the R-devel
mailing list