[R] Avoiding factors and levels in data frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Sep 1 12:23:55 CEST 2008


It's a problem in read.fwf.  It should not set a default for as.is, and in 
R-devel will not.

On Mon, 1 Sep 2008, Ted.Harding at manchester.ac.uk wrote:

> On 01-Sep-08 08:20:25, ONKELINX, Thierry wrote:
>>
>> Try to add options(stringsAsFactors = FALSE) in your Rprofile.site
>> (in the etc directory). Using as.is = TRUE seems safer than
>> stringsAsFactors = FALSE in the read.fwf function. Because as.is
>> is set to FALSE by default and stringsAsFactors is not set.
>>
>> HTH,
>>
>> Thierry
>
> Can I ask for some elucidation about how the code operates here?
> Apparently read.fwf() calls read.table(), and ?read.fwf refers
> you to ?read.table for things like 'as.is' and 'stringsAsFactors'.
>
> When I look at the code for read.table, I see in the paramater
> list:
>
> function (file, .... , as.is = !stringsAsFactors, ... ,
>          stringsAsFactors = default.stringsAsFactors(), ... )
>
> with *no further reference whatever* to 'stringsAsFactors' in the
> body of the function. In particular, there is no test that I can
> see of whether or not 'stringsAsFactors' has been set by the user
> in the call.
>
> The standard result of default.stringsAsFactors() is TRUE.
>
> I've written a tiny test function:
>
>  temp<-function(as.is = !stringsAsFactors,
>        stringsAsFactors = default.stringsAsFactors()){
>  print(c(as.is=as.is, sAF=stringsAsFactors))
>  }
>
>  temp()
> # as.is   sAF
> # FALSE  TRUE
>
>  temp(stringsAsFactors = FALSE)
> # as.is   sAF
> #  TRUE FALSE
>
>  temp(as.is=FALSE,stringsAsFactors = FALSE)
> # as.is   sAF
> # FALSE FALSE
>
> So, if read.table is called with 'as.is=FALSE' (which is the default
> set by read.fwf(), with any reference to 'stringsAsFactors' in the
> call being part of the "..." which is passed to read.table()), then
> read.table will be called with 'as.is=FALSE' regardless of whether
> 'stringsAsFactors=FALSE' has been set explicitly in calling read.fwf().
>
> The only way to get 'as.is' to be TRUE would be to set it explicitly
> in the call to read.fwf() (and in that case one need not bother with
> 'stringsAsFactors', since its only purpose seems to be to determine
> the value of 'as.is'). Or, of course, to set default.stringsAsFactors
> to be FALSE; but in many case people will want to have per-case
> control over what happens in cases like this.
>
> Well, that's how it seems to me, on reading the code. Is this what
> Thierry really means when he says "stringsAsFactors is not set"?
>
> If that is the case, then it seems to indicate some conflict or
> inconsistency between read.fwf() and read.table() in this respect.
> In any case, it strikes me as something of an undesirable tangle!
>
> With thanks for any comments,
> Ted.
>
>> -----Oorspronkelijk bericht-----
>> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> Namens Asher Meir
>> Verzonden: zondag 31 augustus 2008 11:02
>> Aan: r-help at r-project.org
>> Onderwerp: [R] Avoiding factors and levels in data frames
>>
>> Hello all.
>>
>> I am an experienced R user, I have used R for many years for a wide
>> variety of applications. However, I keep on running into one obstacle:
>> I never want factors or levels in my data frames, but I keep on
>> getting them. Is there any way to globally turn this whole feature of
>> data frames off? Using options(stringAsFactors=FALSE) does not seem to
>> work.
>> Alternatively, if I have a data frame with levels, can I just get rid
>> of them in that data frame?
>>
>> Here is an example: I have a large text file, of which part is in the
>> fixed-width tabular form I need. I created a widths vector and a
>> column names vector. I then read the file as follows:
>>
>> raw1<-read.fwf(fn1,widths=widmax,col.names=headermax,stringsAsFactors=FA
>> LSE)
>>
>> But raw1 still has factors! It is an old class data frame:
>>
>>> is(raw1)
>> [1] "data.frame" "oldClass"
>>
>> And it still has levels:
>>> raw1[1,1]
>> [1] Gustav wind
>> 229 Levels: - - - - - - -     - - - - WIN       - - - M ... Z INDICATES
>> C
>>
>> My question is:
>> 1. Can I get rid of the levels in raw1?
>> 2. Even better -- can I stop it getting read in as a data frame with
>> factors?
>> 3. Even better -- can I just tell R to never use factors in my data
>> frames?
>>
>> Or any other solution that occurs to people -- maybe this is the wrong
>> way to go about reading in fixed width data in this kind of file.
>>
>> I would appreciate any help.
>>
>> Asher
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 01-Sep-08                                       Time: 10:22:55
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list