[Rd] should `data` respect default.stringsAsFactors()?

Cook, Malcolm MEC at stowers.org
Fri Feb 19 16:02:36 CET 2016


Hi,

 > Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table.
 > 
 > Yes, the doc should probably be fixed. The code probably not 

Agreed.  

Is someone on-list authorized and willing to make the documentation change?  I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill.  Dissenting opinions?

> -- packages
 > loading different data sets depending on user options is an even worse idea
 > than havíng the option in the first place... (I don't mean having the possibility, I
 > mean the default.stringsAsFactor thing).
 > 
 > In general, read.table() gets many things wrong

I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern.  My concern is that data() does not work as documented.

~Malcolm

> , if you don't set switches
 > and/or postprocess. E.g., even when you do intend to read factors, the
 > alphabetical level order is often not desired. My favourite workaround for
 > data() is to drop a corresponding foo.R file in the ./data directory. This will be
 > run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
 > 
 > dd <- read.table(foo.txt,.....)
 > dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
 > 
 > etc.
 > 
 > -pd
 > 
 > 
 > 
 > > On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
 > >
 > > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org>
 > wrote:
 > >> Hi Peter,
 > >>
 > >> Sorry if I was not clear.  Perhaps an example will make my point:
 > >>
 > >>> data(iris)
 > >>> class(iris$Species)
 > >> [1] "factor"
 > >>> write.table(iris,'data/myiris.tab')
 > >>> data(myiris)
 > >>> class(myiris$Species)
 > >> [1] "factor"
 > >>> rm(myiris)
 > >>> options(stringsAsFactors = FALSE)
 > >>> data(myiris)
 > >>> class(myiris$Species)
 > >> [1] "factor"
 > >>> myiris<-read.table("data/myiris.tab",header=TRUE)
 > >>> class(myiris$Species)
 > >> [1] "character"
 > >>
 > >> I am surprised to find that in the above
 > >>          setting the global option stringsAsFactors = FALSE does NOT effect
 > how Species is being read in by the `data` function
 > >> whereas
 > >>        setting the global option stringsAsFactors = FALSE DOES effect how
 > Species is being read in by read.table
 > >>
 > >> especially since data is documented as calling read.table.
 > >>
 > > To be explicit, it's documented as calling read.table(..., header =
 > > TRUE) in this case, but it actually calls read.table(..., header =
 > > TRUE, as.is = FALSE), which results in class(myiris$Species) of
 > > "factor".
 > >
 > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
 > > R> class(myiris$Species)
 > > [1] "factor"
 > >
 > > So it seems like adding as.is = FALSE to the call in the documentation
 > > would clear this up.
 > >
 > >> In my opinion, one or the other should change (the behavior of data, or the
 > documentation).
 > >>
 > >> <bleep> <bleep>,
 > >>
 > >> ~ Malcolm
 > >>
 > >>
 > >>> -----Original Message-----
 > >>> From: peter dalgaard [mailto:pdalgd at gmail.com]
 > >>> Sent: Thursday, February 18, 2016 3:32 PM
 > >>> To: Cook, Malcolm <MEC at stowers.org>
 > >>> Cc: r-devel at stat.math.ethz.ch
 > >>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
 > >>>
 > >>> What the <bleep> are you on about? data() does many things, only some
 > of
 > >>> which call read.table() et al., and the ones that do have no special
 > treatment
 > >>> of stringsAsFactors.
 > >>>
 > >>> -pd
 > >>>
 > >>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
 > >>>>
 > >>>> Hiya,
 > >>>>
 > >>>> Probably been debated elsewhere....
 > >>>>
 > >>>> I note that R's `data` function does not respect default.stringsAsFactors
 > >>>>
 > >>>> By my lights, it should, especially as it is documented to call read.table,
 > >>> which DOES respect.
 > >>>>
 > >>>> Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
 > >>> tp921891p921893.html
 > >>>>
 > >>>> Compelling.  I have to agree.
 > >>>>
 > >>>> So, I change my mind.
 > >>>>
 > >>>> By my lights, `data` should then be documented to NOT respect
 > >>> default.stringsAsFactors.
 > >>>>
 > >>>> Else?
 > >>>>
 > >>>> ~Malcolm Cook
 > >>>>
 > >>>> ______________________________________________
 > >>>> R-devel at r-project.org mailing list
 > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
 > >>>
 > >>> --
 > >>> Peter Dalgaard, Professor,
 > >>> Center for Statistics, Copenhagen Business School
 > >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 > >>> Phone: (+45)38153501
 > >>> Office: A 4.23
 > >>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
 > >>>
 > >>>
 > >>>
 > >>>
 > >>>
 > >>>
 > >>>
 > >>>
 > >>
 > >> ______________________________________________
 > >> R-devel at r-project.org mailing list
 > >> https://stat.ethz.ch/mailman/listinfo/r-devel
 > >
 > >
 > >
 > > --
 > > Joshua Ulrich  |  about.me/joshuaulrich
 > > FOSS Trading  |  www.fosstrading.com
 > > R/Finance 2016 | www.rinfinance.com
 > 
 > --
 > Peter Dalgaard, Professor,
 > Center for Statistics, Copenhagen Business School
 > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 > Phone: (+45)38153501
 > Office: A 4.23
 > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 



More information about the R-devel mailing list