[R] length() misbehaving?

Marc Schwartz mschwartz at medanalytics.com
Fri Mar 14 17:33:40 CET 2003

>-----Original Message-----
>From: Marc Schwartz [mailto:mschwartz at medanalytics.com] 
>Sent: Friday, March 14, 2003 10:23 AM
>To: 'David Parkhurst'; 'r-help at stat.math.ethz.ch'
>Subject: RE: [R] length() misbehaving?
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David
>>Sent: Friday, March 14, 2003 9:35 AM
>>To: r-help at stat.math.ethz.ch
>>Subject: [R] length() misbehaving?
>>I'm having a weird problem with length(), in R1.6.1 under
>>windows2000.  I have a dataframe called byyr, with ten 
>>columns, the first of which is named cnd95.
>>summary(byyr) shows that byyr$cnd95 contains the factor level 
>>"tr" 66 times.  Also, when I enter byyr$cnd95 at the command 
>>line, I can count 66 "tr" elements in the resulting vector.  
>>However, when I enter
>>n95trt <- length(byyr$cnd95[byyr$cnd95=="tr"])
>>the result is 68!  Any ideas why this is happening, and how I
>>can fix the miscount? (That column also contains 69 entries of 
>>"c", and (relevantly?) two NA's.)
>>Thanks for any help.
>>Dave Parkhurst
>It is expected.
>Since NA represents a true unknown, the two NA's in your 
>vector 'may be' a "tr".  Thus, you get TRUE for the NA's when 
>making the comparison.
>Instead of length(), you might want to use:
>sum(byyr$cnd95[byyr$cnd95 == "tr"], na.rm = TRUE)
>which will remove the two NA's.
>See ?sum
>Marc Schwartz

Correction.  I mis-copied the code.  It should be:

sum(byyr$cnd95 == "tr", na.rm = TRUE)



More information about the R-help mailing list