[R] Avoiding factors and levels in data frames

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Sep 1 12:13:15 CEST 2008


Hi Thierry,

On 01-Sep-08 09:45:27, ONKELINX, Thierry wrote:
> 
> Dear Ted,
> 
> I noticed that as.is was set by default in read.fwf. So if the
> user sets stringsAsFactor it is passed through ... to read.table.
> But I'm not sure how as.is is passed to read.table when onlye
> stringsAsFactors is set. If it's the default (FALSE) then it might
> be conflicting with stringsAsFactors. Therefore my suggestion to
> use as.is instead of stringsAsFactors in this case.

Yes, that is how I think I see it too. I have now written two
tiny test functions. temp.table() is the same as temp() before [below],
temp.fwf() uses its arguments in the same way as read.fwf().
Also, temp.fwf() calls temp.table() in the same way as read.fwf()
calls read.table() (as far as 'as.is' and 'stringsAsFactors' are
concerned -- I hope!).

  temp.table<-function(as.is = !stringsAsFactors,
          stringsAsFactors = default.stringsAsFactors()){
    print(c(as.is=as.is, sAF=stringsAsFactors))
    }

  temp.fwf<-function(as.is=FALSE,...){ temp.table(as.is=as.is,...) }

and now:

  temp.fwf(as.is=FALSE,stringsAsFactors=FALSE)
# as.is   sAF 
# FALSE FALSE 

  temp.fwf(as.is=FALSE,stringsAsFactors=TRUE)
# as.is   sAF 
# FALSE  TRUE 

  temp.fwf(as.is=TRUE,stringsAsFactors=FALSE)
# as.is   sAF 
#  TRUE FALSE 

  temp.fwf(as.is=TRUE,stringsAsFactors=TRUE)
# as.is   sAF 
#  TRUE  TRUE 

  temp.fwf(stringsAsFactors=TRUE)
# as.is   sAF 
# FALSE  TRUE 

  temp.fwf(stringsAsFactors=FALSE)
# as.is   sAF 
# FALSE FALSE 

showing that the 'as.is' result from temp.fwf() is independent
of any value of 'stringsAsFactors' set in its paramater-list.

> I suppose it might be a good idea to add stringsAsFactor to the
> argumentlist of read.fwf and give it the same defaults as read.table.

I was thinking the same, too.
Ted.

> Cheers,
> 
> Thierry
> 
> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> Namens Ted.Harding at manchester.ac.uk
> Verzonden: maandag 1 september 2008 11:23
> Aan: r-help at r-project.org
> Onderwerp: Re: [R] Avoiding factors and levels in data frames
> 
> On 01-Sep-08 08:20:25, ONKELINX, Thierry wrote:
>>
>> Try to add options(stringsAsFactors = FALSE) in your Rprofile.site
>> (in the etc directory). Using as.is = TRUE seems safer than
>> stringsAsFactors = FALSE in the read.fwf function. Because as.is
>> is set to FALSE by default and stringsAsFactors is not set.
>>
>> HTH,
>>
>> Thierry
> 
> Can I ask for some elucidation about how the code operates here?
> Apparently read.fwf() calls read.table(), and ?read.fwf refers
> you to ?read.table for things like 'as.is' and 'stringsAsFactors'.
> 
> When I look at the code for read.table, I see in the paramater
> list:
> 
> function (file, .... , as.is = !stringsAsFactors, ... ,
>           stringsAsFactors = default.stringsAsFactors(), ... )
> 
> with *no further reference whatever* to 'stringsAsFactors' in the
> body of the function. In particular, there is no test that I can
> see of whether or not 'stringsAsFactors' has been set by the user
> in the call.
> 
> The standard result of default.stringsAsFactors() is TRUE.
> 
> I've written a tiny test function:
> 
>   temp<-function(as.is = !stringsAsFactors,
>         stringsAsFactors = default.stringsAsFactors()){
>   print(c(as.is=as.is, sAF=stringsAsFactors))
>   }
> 
>   temp()
># as.is   sAF
># FALSE  TRUE
> 
>   temp(stringsAsFactors = FALSE)
># as.is   sAF
>#  TRUE FALSE
> 
>   temp(as.is=FALSE,stringsAsFactors = FALSE)
># as.is   sAF
># FALSE FALSE
> 
> So, if read.table is called with 'as.is=FALSE' (which is the default
> set by read.fwf(), with any reference to 'stringsAsFactors' in the
> call being part of the "..." which is passed to read.table()), then
> read.table will be called with 'as.is=FALSE' regardless of whether
> 'stringsAsFactors=FALSE' has been set explicitly in calling read.fwf().
> 
> The only way to get 'as.is' to be TRUE would be to set it explicitly
> in the call to read.fwf() (and in that case one need not bother with
> 'stringsAsFactors', since its only purpose seems to be to determine
> the value of 'as.is'). Or, of course, to set default.stringsAsFactors
> to be FALSE; but in many case people will want to have per-case
> control over what happens in cases like this.
> 
> Well, that's how it seems to me, on reading the code. Is this what
> Thierry really means when he says "stringsAsFactors is not set"?
> 
> If that is the case, then it seems to indicate some conflict or
> inconsistency between read.fwf() and read.table() in this respect.
> In any case, it strikes me as something of an undesirable tangle!
> 
> With thanks for any comments,
> Ted.
> 
>> -----Oorspronkelijk bericht-----
>> Van: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org]
>> Namens Asher Meir
>> Verzonden: zondag 31 augustus 2008 11:02
>> Aan: r-help at r-project.org
>> Onderwerp: [R] Avoiding factors and levels in data frames
>>
>> Hello all.
>>
>> I am an experienced R user, I have used R for many years for a wide
>> variety of applications. However, I keep on running into one obstacle:
>> I never want factors or levels in my data frames, but I keep on
>> getting them. Is there any way to globally turn this whole feature of
>> data frames off? Using options(stringAsFactors=FALSE) does not seem to
>> work.
>> Alternatively, if I have a data frame with levels, can I just get rid
>> of them in that data frame?
>>
>> Here is an example: I have a large text file, of which part is in the
>> fixed-width tabular form I need. I created a widths vector and a
>> column names vector. I then read the file as follows:
>>
>>
> raw1<-read.fwf(fn1,widths=widmax,col.names=headermax,stringsAsFactors=FA
>> LSE)
>>
>> But raw1 still has factors! It is an old class data frame:
>>
>>> is(raw1)
>> [1] "data.frame" "oldClass"
>>
>> And it still has levels:
>>> raw1[1,1]
>> [1] Gustav wind
>> 229 Levels: - - - - - - -     - - - - WIN       - - - M ... Z
> INDICATES
>> C
>>
>> My question is:
>> 1. Can I get rid of the levels in raw1?
>> 2. Even better -- can I stop it getting read in as a data frame with
>> factors?
>> 3. Even better -- can I just tell R to never use factors in my data
>> frames?
>>
>> Or any other solution that occurs to people -- maybe this is the wrong
>> way to go about reading in fixed width data in this kind of file.
>>
>> I would appreciate any help.
>>
>> Asher

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-08                                       Time: 11:13:12
------------------------------ XFMail ------------------------------



More information about the R-help mailing list