[R] non-intuitive behaviour after type conversion
Alan Kelly
akelly at tcd.ie
Mon Nov 23 09:54:19 CET 2009
Deal list,
I have a data frame (birth) with mixed variables (numeric and
alphanumeric). One variable "t1stvisit" was originally coded as
numeric with values 1,2, and 3. After attaching the data frame, this
is what I see when I use str(t1stvisit)
$ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ...
This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second
copy of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame "birth"
and, as noted, I have not created any new variables. So why does R
still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE
Yet when I try the following:
> is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit" - the original
numeric version and the correct factor version although ls() only
shows "birth" as present in the workspace.
If I type:
> summary(t1stvisit)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 1.000 2.000 1.574 2.000 3.000 29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
1 2 3 NA's
180 169 22 29
I get the factor version.
Frankly I feel that this behaviour is non-intuitive and potentially
problematic. Nor have I seen warnings about this in the various text
books on R.
Can anyone comment on why this should occur?
Many thanks,
Alan Kelly
Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin
More information about the R-help
mailing list