[R] non-intuitive behaviour after type conversion
Don MacQueen
macq at llnl.gov
Mon Nov 23 18:41:50 CET 2009
When you attach() something, it loads it into memory and there it
stays. It is not a link, reference, or pointer to the original.
Changing the original (the version in the dataframe), which is what
you did, does not change the attached copy in memory. In essence, you
did a type conversion on one copy, but afterwards started looking at
the other copy.
See also an interjected comments below.
-Don
At 8:54 AM +0000 11/23/09, Alan Kelly wrote:
>Deal list,
>I have a data frame (birth) with mixed variables (numeric and
>alphanumeric). One variable "t1stvisit" was originally coded as
>numeric with values 1,2, and 3. After attaching the data frame,
>this
>is what I see when I use str(t1stvisit)
>
>$ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ...
>
>This is as expected.
>I then convert t1stvisit to a factor and to avoid creating a second
>copy of this variable independent of the data frame I use:
>birth$t1stvisit = as.factor(birth$t1stvisit)
>if I check that the conversion has worked:
>is.factor(t1stvisit)
>[1] FALSE
>Now the only object present in the workspace in the data frame
>"birth" and, as noted, I have not created any new variables. So why
>does R still treat t1stvisit as numeric?
>is.factor(t1stvisit)
>[1] FALSE
>
>Yet when I try the following:
>> is.factor(birth$t1stvisit)
>[1] TRUE
>So, there appears to be two versions of "t1stvisit" - the original
>numeric version and the correct factor version although ls() only
>shows "birth" as present in the workspace.
Right.
find('t1stvisit')
will show you there are two of them, and where in memory they are located.
If you type
t1stvisit
at the prompt, you always get the first one. The one in the attached
dataframe is the second one. Use the
search()
function to show you the different locations in memory where objects
can be found.
When you did the attach(), did you get a message like:
> attach(tmp)
The following object(s) are masked _by_ .GlobalEnv :
x
(yours would have referred to your variables, not the "x" in my example).
That message tells you you have two variables of the same name,
stored in two different locations in the search path.
As a general rule, it's just plain confusing to have more than one
object of the same name in more than one location. In your situation,
I would get rid of the one that's not in the dataframe. But even
then, if you change it in the dataframe you'll still need to detach
and re-attach the dataframe, so using attach() is probably not the
best choice in the long run. Maybe the with() function would meet
your needs.
>If I type:
>> summary(t1stvisit)
> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> 1.000 1.000 2.000 1.574 2.000 3.000 29.000
>I get the numeric version, but if I try
>summary(birth$t1stvisit)
> 1 2 3 NA's
> 180 169 22 29
>I get the factor version.
>
>Frankly I feel that this behaviour is non-intuitive and potentially
>problematic. Nor have I seen warnings about this in the various text
>books on R.
>Can anyone comment on why this should occur?
>Many thanks,
>Alan Kelly
>
>Dr. Alan Kelly
>Department of Public Health & Primary Care
>Trinity College Dublin
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
More information about the R-help
mailing list