[R] length() misbehaving?
mschwartz at medanalytics.com
Fri Mar 14 17:33:40 CET 2003
>From: Marc Schwartz [mailto:mschwartz at medanalytics.com]
>Sent: Friday, March 14, 2003 10:23 AM
>To: 'David Parkhurst'; 'r-help at stat.math.ethz.ch'
>Subject: RE: [R] length() misbehaving?
>>From: r-help-bounces at stat.math.ethz.ch
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David
>>Sent: Friday, March 14, 2003 9:35 AM
>>To: r-help at stat.math.ethz.ch
>>Subject: [R] length() misbehaving?
>>I'm having a weird problem with length(), in R1.6.1 under
>>windows2000. I have a dataframe called byyr, with ten
>>columns, the first of which is named cnd95.
>>summary(byyr) shows that byyr$cnd95 contains the factor level
>>"tr" 66 times. Also, when I enter byyr$cnd95 at the command
>>line, I can count 66 "tr" elements in the resulting vector.
>>However, when I enter
>>n95trt <- length(byyr$cnd95[byyr$cnd95=="tr"])
>>the result is 68! Any ideas why this is happening, and how I
>>can fix the miscount? (That column also contains 69 entries of
>>"c", and (relevantly?) two NA's.)
>>Thanks for any help.
>It is expected.
>Since NA represents a true unknown, the two NA's in your
>vector 'may be' a "tr". Thus, you get TRUE for the NA's when
>making the comparison.
>Instead of length(), you might want to use:
>sum(byyr$cnd95[byyr$cnd95 == "tr"], na.rm = TRUE)
>which will remove the two NA's.
Correction. I mis-copied the code. It should be:
sum(byyr$cnd95 == "tr", na.rm = TRUE)
More information about the R-help