[Rd] should `data` respect default.stringsAsFactors()?
peter dalgaard
pdalgd at gmail.com
Fri Feb 19 16:23:19 CET 2016
On 19 Feb 2016, at 16:02 , Cook, Malcolm <MEC at stowers.org> wrote:
> Hi,
>
>> Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table.
>>
>> Yes, the doc should probably be fixed. The code probably not
>
> Agreed.
>
> Is someone on-list authorized and willing to make the documentation change? I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill. Dissenting opinions?
I have fixed it for r-devel.
-pd
>
>> -- packages
>> loading different data sets depending on user options is an even worse idea
>> than havíng the option in the first place... (I don't mean having the possibility, I
>> mean the default.stringsAsFactor thing).
>>
>> In general, read.table() gets many things wrong
>
> I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern. My concern is that data() does not work as documented.
>
> ~Malcolm
>
>> , if you don't set switches
>> and/or postprocess. E.g., even when you do intend to read factors, the
>> alphabetical level order is often not desired. My favourite workaround for
>> data() is to drop a corresponding foo.R file in the ./data directory. This will be
>> run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
>>
>> dd <- read.table(foo.txt,.....)
>> dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
>>
>> etc.
>>
>> -pd
>>
>>
>>
>>> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
>>>
>>> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org>
>> wrote:
>>>> Hi Peter,
>>>>
>>>> Sorry if I was not clear. Perhaps an example will make my point:
>>>>
>>>>> data(iris)
>>>>> class(iris$Species)
>>>> [1] "factor"
>>>>> write.table(iris,'data/myiris.tab')
>>>>> data(myiris)
>>>>> class(myiris$Species)
>>>> [1] "factor"
>>>>> rm(myiris)
>>>>> options(stringsAsFactors = FALSE)
>>>>> data(myiris)
>>>>> class(myiris$Species)
>>>> [1] "factor"
>>>>> myiris<-read.table("data/myiris.tab",header=TRUE)
>>>>> class(myiris$Species)
>>>> [1] "character"
>>>>
>>>> I am surprised to find that in the above
>>>> setting the global option stringsAsFactors = FALSE does NOT effect
>> how Species is being read in by the `data` function
>>>> whereas
>>>> setting the global option stringsAsFactors = FALSE DOES effect how
>> Species is being read in by read.table
>>>>
>>>> especially since data is documented as calling read.table.
>>>>
>>> To be explicit, it's documented as calling read.table(..., header =
>>> TRUE) in this case, but it actually calls read.table(..., header =
>>> TRUE, as.is = FALSE), which results in class(myiris$Species) of
>>> "factor".
>>>
>>> R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
>>> R> class(myiris$Species)
>>> [1] "factor"
>>>
>>> So it seems like adding as.is = FALSE to the call in the documentation
>>> would clear this up.
>>>
>>>> In my opinion, one or the other should change (the behavior of data, or the
>> documentation).
>>>>
>>>> <bleep> <bleep>,
>>>>
>>>> ~ Malcolm
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: peter dalgaard [mailto:pdalgd at gmail.com]
>>>>> Sent: Thursday, February 18, 2016 3:32 PM
>>>>> To: Cook, Malcolm <MEC at stowers.org>
>>>>> Cc: r-devel at stat.math.ethz.ch
>>>>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
>>>>>
>>>>> What the <bleep> are you on about? data() does many things, only some
>> of
>>>>> which call read.table() et al., and the ones that do have no special
>> treatment
>>>>> of stringsAsFactors.
>>>>>
>>>>> -pd
>>>>>
>>>>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
>>>>>>
>>>>>> Hiya,
>>>>>>
>>>>>> Probably been debated elsewhere....
>>>>>>
>>>>>> I note that R's `data` function does not respect default.stringsAsFactors
>>>>>>
>>>>>> By my lights, it should, especially as it is documented to call read.table,
>>>>> which DOES respect.
>>>>>>
>>>>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
>>>>> tp921891p921893.html
>>>>>>
>>>>>> Compelling. I have to agree.
>>>>>>
>>>>>> So, I change my mind.
>>>>>>
>>>>>> By my lights, `data` should then be documented to NOT respect
>>>>> default.stringsAsFactors.
>>>>>>
>>>>>> Else?
>>>>>>
>>>>>> ~Malcolm Cook
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>> --
>>>>> Peter Dalgaard, Professor,
>>>>> Center for Statistics, Copenhagen Business School
>>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>>> Phone: (+45)38153501
>>>>> Office: A 4.23
>>>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>>
>>> --
>>> Joshua Ulrich | about.me/joshuaulrich
>>> FOSS Trading | www.fosstrading.com
>>> R/Finance 2016 | www.rinfinance.com
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>
>>
>>
>>
>>
>>
>>
>>
>
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list