[R] Problem with a non-factor, non-numeric variable in a data.frame

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Nov 12 08:20:29 CET 2007


I don't know what you mean by 'the latest release', but R does not behave 
this way in a vanilla session in 2.6.0 or R-patched.  Had you given the 
'at a minimum' information asked for in the posting guide, we might have 
been able to deduce what the problem is.  I get

> x1 <- c(1:4,4,4,4)
> f <- factor(x1)
> levels(f)
[1] "1" "2" "3" "4"
> as.numeric(levels(f))
[1] 1 2 3 4
> as.numeric(levels(f))[as.integer(f)]
[1] 1 2 3 4 4 4 4

My guess is that your session has a package attached that is corrupting 
as.numeric.  This is known to happen under certain circumstances: see

https://stat.ethz.ch/pipermail/r-help/2007-October/142367.html

for a warning about not re-installing packages when upgrading: most likely 
you have used one of the packages which make as.numeric S4 generic and 
installed it in R < 2.6.0.


On Mon, 12 Nov 2007, Alun Pope wrote:

> Unfortunately it is not quite as simple as this.

Agreed, as the possibility of user error needs to be taken into account.

>   It seems (to me) that
> the change to the behaviour of as.numeric() in the latest release means
> that the advice given in FAQ 7.10 (and the help) is incorrect when the
> factor levels are integers.  The following example illustrates:

Factor levels should always be character, and are in your example.

>> x1 <- c(1:4,4,4,4)
>> f <- factor(x1)
>> levels(f)
> [1] "1" "2" "3" "4"
>> as.numeric(levels(f))[as.integer(f)] #this is pasted from FAQ 7.10
> Error in UseMethod("as.double") : no applicable method for "as.double"
>> as.integer(levels(f))[as.integer(f)]
> [1] 1 2 3 4 4 4 4
>
> And (not surprisingly) the other FAQ 7.10 recommendation behaves the
> same way.
>
> Even more unfortunate is the fact that packages may also be affected.
> Yhis seems to be the case for example with spdep, in which the function
> nb2listw() contains statements of the form
>
>    mode(x) <- "numeric"
>
> which I assume is what is causing that function to fail with the same
> error message as above.  I shall post separately to the list for spdep.
>
> Alun
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Matthew Keller
> Sent: Wednesday, 7 November 2007 8:04 AM
> To: Alexandre Santos
> Cc: r-help at r-project.org
> Subject: Re: [R] Problem with a non-factor,non-numeric variable in a
> data.frame
>
> Alexandre,
>
> Try rereading FAX 7.10, it explains why as.numeric() won't do it:
>
> "In any case, do not call as.numeric() or their likes directly for the
> task at hand as as.numeric() or unclass() give the internal codes"
>
> I.e., the INTERNAL CODE of the factor is what as.numeric() is working
> on rather than the numeric representation that you see.
>
>
>
> On 11/6/07, Alexandre Santos <alexandre.santos at ochipepe.org> wrote:
>> I tested
>>
>> as.numeric(as.character(Ratio))
>>
>> and it works perfectly!
>>
>> I still don't get why as.numeric(Ratio) was not enough, but at least
>> now I know how to deal with it.
>>
>> Thanks for the tip, and sorry for missing the R-FAQ issue 7.10.
>>
>> Cheers,
>> Alexandre Santos
>>
>>
>> 2007/11/6, John Kane <jrkrideau at yahoo.ca>:
>>> Have a look at the R-FAQ issue 7.10.   It's a standard
>>> problem
>>>
>>> For more information about your variable try
>>>
>>> str(variable).
>>>
>>>
>>> --- Alexandre Santos <alexandre.santos at ochipepe.org>
>>> wrote:
>>>
>>>> Dear R list,
>>>>
>>>> I would like to perform an ANOVA in a set of
>>>> measurements, but I have
>>>> problems formatting the data.
>>>>
>>>> The data is a two dimensional array containing two
>>>> columns:
>>>> - "Stim" : the type of stimulation (string)
>>>> - "Ratio" : a ratio of two numeric values
>>>>
>>>> Now, because some values are missing in the data
>>>> (defaulting to zero),
>>>> part of this array will be populated with NA ratios.
>>>> Maybe this is
>>>> important later.
>>>>
>>>> In order to make the ANOVA analysis, I need to turn
>>>> my vector into a data.frame.
>>>>
>>>> I tried vector.table=as.data.frame(vector)
>>>>
>>>> But I realized that
>>>> is.numeric(Ratio) gives FALSE
>>>> is.factor(Ratio) gives TRUE
>>>>
>>>> After reading the documentation, I tried
>>>>
>>>> vector.table=as.data.frame(vector, stringsAsFactors
>>>> = FALSE)
>>>>
>>>> This time
>>>>
>>>> is.numeric(Ratio) gives FALSE
>>>> is.factor(Ratio) gives FALSE
>>>>
>>>> So I don't even know what is Ratio, but it's not yet
>>>> numeric (is this
>>>> due to the NA values?).
>>>>
>>>> How can I get R to understand that Ratio is numeric?
>>>> Checking the
>>>> documentation it seems you can do it with I(x), but
>>>> the details are
>>>> not explained. I also tried as.numeric(Ratio), and
>>>> everything was
>>>> turned into zeros.
>>>>
>>>> Any suggestions?
>>>>
>>>> Cheers,
>>>>
>>>> Alexandre Santos
>>>>
>>>> Neuro-MPI, Martinsread, Germany


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list