[Rd] should `data` respect default.stringsAsFactors()?
Cook, Malcolm
MEC at stowers.org
Fri Feb 19 16:02:36 CET 2016
Hi,
> Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table.
>
> Yes, the doc should probably be fixed. The code probably not
Agreed.
Is someone on-list authorized and willing to make the documentation change? I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill. Dissenting opinions?
> -- packages
> loading different data sets depending on user options is an even worse idea
> than havíng the option in the first place... (I don't mean having the possibility, I
> mean the default.stringsAsFactor thing).
>
> In general, read.table() gets many things wrong
I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern. My concern is that data() does not work as documented.
~Malcolm
> , if you don't set switches
> and/or postprocess. E.g., even when you do intend to read factors, the
> alphabetical level order is often not desired. My favourite workaround for
> data() is to drop a corresponding foo.R file in the ./data directory. This will be
> run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
>
> dd <- read.table(foo.txt,.....)
> dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
>
> etc.
>
> -pd
>
>
>
> > On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
> >
> > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org>
> wrote:
> >> Hi Peter,
> >>
> >> Sorry if I was not clear. Perhaps an example will make my point:
> >>
> >>> data(iris)
> >>> class(iris$Species)
> >> [1] "factor"
> >>> write.table(iris,'data/myiris.tab')
> >>> data(myiris)
> >>> class(myiris$Species)
> >> [1] "factor"
> >>> rm(myiris)
> >>> options(stringsAsFactors = FALSE)
> >>> data(myiris)
> >>> class(myiris$Species)
> >> [1] "factor"
> >>> myiris<-read.table("data/myiris.tab",header=TRUE)
> >>> class(myiris$Species)
> >> [1] "character"
> >>
> >> I am surprised to find that in the above
> >> setting the global option stringsAsFactors = FALSE does NOT effect
> how Species is being read in by the `data` function
> >> whereas
> >> setting the global option stringsAsFactors = FALSE DOES effect how
> Species is being read in by read.table
> >>
> >> especially since data is documented as calling read.table.
> >>
> > To be explicit, it's documented as calling read.table(..., header =
> > TRUE) in this case, but it actually calls read.table(..., header =
> > TRUE, as.is = FALSE), which results in class(myiris$Species) of
> > "factor".
> >
> > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
> > R> class(myiris$Species)
> > [1] "factor"
> >
> > So it seems like adding as.is = FALSE to the call in the documentation
> > would clear this up.
> >
> >> In my opinion, one or the other should change (the behavior of data, or the
> documentation).
> >>
> >> <bleep> <bleep>,
> >>
> >> ~ Malcolm
> >>
> >>
> >>> -----Original Message-----
> >>> From: peter dalgaard [mailto:pdalgd at gmail.com]
> >>> Sent: Thursday, February 18, 2016 3:32 PM
> >>> To: Cook, Malcolm <MEC at stowers.org>
> >>> Cc: r-devel at stat.math.ethz.ch
> >>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
> >>>
> >>> What the <bleep> are you on about? data() does many things, only some
> of
> >>> which call read.table() et al., and the ones that do have no special
> treatment
> >>> of stringsAsFactors.
> >>>
> >>> -pd
> >>>
> >>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
> >>>>
> >>>> Hiya,
> >>>>
> >>>> Probably been debated elsewhere....
> >>>>
> >>>> I note that R's `data` function does not respect default.stringsAsFactors
> >>>>
> >>>> By my lights, it should, especially as it is documented to call read.table,
> >>> which DOES respect.
> >>>>
> >>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
> >>> tp921891p921893.html
> >>>>
> >>>> Compelling. I have to agree.
> >>>>
> >>>> So, I change my mind.
> >>>>
> >>>> By my lights, `data` should then be documented to NOT respect
> >>> default.stringsAsFactors.
> >>>>
> >>>> Else?
> >>>>
> >>>> ~Malcolm Cook
> >>>>
> >>>> ______________________________________________
> >>>> R-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>> --
> >>> Peter Dalgaard, Professor,
> >>> Center for Statistics, Copenhagen Business School
> >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >>> Phone: (+45)38153501
> >>> Office: A 4.23
> >>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> > --
> > Joshua Ulrich | about.me/joshuaulrich
> > FOSS Trading | www.fosstrading.com
> > R/Finance 2016 | www.rinfinance.com
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
More information about the R-devel
mailing list