[R] non-intuitive behaviour after type conversion
Peter Ehlers
ehlers at ucalgary.ca
Mon Nov 23 13:34:04 CET 2009
Alan Kelly wrote:
> Deal list,
> I have a data frame (birth) with mixed variables (numeric and
> alphanumeric). One variable "t1stvisit" was originally coded as numeric
> with values 1,2, and 3. After attaching the data frame, this is what I
> see when I use str(t1stvisit)
actually, str(birth), I suspect, but not important.
>
> $ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ...
>
> This is as expected.
> I then convert t1stvisit to a factor and to avoid creating a second copy
> of this variable independent of the data frame I use:
> birth$t1stvisit = as.factor(birth$t1stvisit)
> if I check that the conversion has worked:
> is.factor(t1stvisit)
> [1] FALSE
> Now the only object present in the workspace in the data frame "birth"
> and, as noted, I have not created any new variables. So why does R
> still treat t1stvisit as numeric?
> is.factor(t1stvisit)
> [1] FALSE
>
> Yet when I try the following:
> > is.factor(birth$t1stvisit)
> [1] TRUE
> So, there appears to be two versions of "t1stvisit" - the original
> numeric version and the correct factor version although ls() only shows
> "birth" as present in the workspace.
> If I type:
> > summary(t1stvisit)
> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> 1.000 1.000 2.000 1.574 2.000 3.000 29.000
> I get the numeric version, but if I try
> summary(birth$t1stvisit)
> 1 2 3 NA's
> 180 169 22 29
> I get the factor version.
>
> Frankly I feel that this behaviour is non-intuitive and potentially
> problematic. Nor have I seen warnings about this in the various text
> books on R.
> Can anyone comment on why this should occur?
I haven't looked at discussions of 'attach()' for a while,
since I rarely use it nowadays (I find with() more convenient
most of the time), but Chapter 6 in 'An Introduction to R'
does discuss it.
There are indeed two versions of 'birth'.
Your basic problem is which version of 'birth' is being modified.
Hint: it's NOT the attached version.
Small example:
dat <- data.frame(x=1:3)
attach(dat)
dat$y <- 4:6
y
#Error: object 'y' not found
dat$y
#[1] 4 5 6
BTW, you don't need as.factor(); use factor().
-Peter Ehlers
> Many thanks,
> Alan Kelly
>
> Dr. Alan Kelly
> Department of Public Health & Primary Care
> Trinity College Dublin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list