[R] non-intuitive behaviour after type conversion

David Winsemius dwinsemius at comcast.net
Mon Nov 23 13:47:21 CET 2009


On Nov 23, 2009, at 7:34 AM, Peter Ehlers wrote:

>
> Alan Kelly wrote:
>> Deal list,
>> I have a data frame (birth) with mixed variables (numeric and  
>> alphanumeric).  One variable "t1stvisit" was originally coded as  
>> numeric with values 1,2, and 3.  After attaching the data frame,  
>> this  is what I see when I use str(t1stvisit)
> actually, str(birth), I suspect, but not important.
>> $ t1stvisit: int  1 1 1 1 1 1 1 1 2 2 ...
>> This is as expected.
>> I then convert t1stvisit to a factor and to avoid creating a second  
>> copy of this variable independent of the data frame I use:
>> birth$t1stvisit = as.factor(birth$t1stvisit)
>> if I check that the conversion has worked:
>> is.factor(t1stvisit)
>> [1] FALSE
>> Now the only object present in the workspace in the data frame  
>> "birth" and, as noted,  I have not created any new variables. So  
>> why does R still treat t1stvisit as numeric?
>> is.factor(t1stvisit)
>> [1] FALSE
>> Yet when I try the following:
>> > is.factor(birth$t1stvisit)
>> [1] TRUE
>> So, there appears to be two versions of "t1stvisit"  - the original  
>> numeric version and the correct factor version although ls() only  
>> shows "birth" as present in the workspace.
>> If I type:
>> > summary(t1stvisit)
>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
>>  1.000   1.000   2.000   1.574   2.000   3.000  29.000
>> I get the numeric version, but if I try
>> summary(birth$t1stvisit)
>>   1    2    3 NA's
>> 180  169   22   29
>> I get the factor version.
>> Frankly I feel that this behaviour is non-intuitive and potentially  
>> problematic. Nor have I seen warnings about this in the various  
>> text books on R.
>> Can anyone comment on why this should occur?
>
> I haven't looked at discussions of 'attach()' for a while,
> since I rarely use it nowadays (I find with() more convenient
> most of the time), but Chapter 6 in 'An Introduction to R'
> does discuss it.
>
> There are indeed two versions of 'birth'.
> Your basic problem is which version of 'birth' is being modified.
> Hint: it's NOT the attached version.
> Small example:
>
> dat <- data.frame(x=1:3)
> attach(dat)
> dat$y <- 4:6
> y
> #Error: object 'y' not found
> dat$y
> #[1] 4 5 6
>
> BTW, you don't need as.factor(); use factor().
>
> -Peter Ehlers

Alan;

Let me second Peter's advice. "Attach" creates more problems than it  
solves. When I ran his code above, I got output from y but it was not  
the 4:6 vector but something else that was in my workspace from a  
prior project. You should also be wary, however, of unexpected (to  
some of us newbies anyway) behavior with "with":

 > with(dat, z<- x + y)
 > dat
   x y
1 1 4
2 2 5
3 3 6
Since with is a function the assignment to z was local within that  
environment.

More effective this way.
 > dat$z <- with(dat, x+y)
 > dat
   x y z
1 1 4 5
2 2 5 7
3 3 6 9

-- 
David

>
>
>> Many thanks,
>> Alan Kelly
>> Dr. Alan Kelly
>> Department of Public Health & Primary Care
>> Trinity College Dublin
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list