[R] length() misbehaving?
Marc Schwartz
mschwartz at medanalytics.com
Fri Mar 14 17:22:48 CET 2003
>-----Original Message-----
>From: r-help-bounces at stat.math.ethz.ch
>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David
Parkhurst
>Sent: Friday, March 14, 2003 9:35 AM
>To: r-help at stat.math.ethz.ch
>Subject: [R] length() misbehaving?
>
>
>I'm having a weird problem with length(), in R1.6.1 under
>windows2000. I have a dataframe called byyr, with ten
>columns, the first of which is named cnd95.
>summary(byyr) shows that byyr$cnd95 contains the factor level
>"tr" 66 times. Also, when I enter byyr$cnd95 at the command
>line, I can count 66 "tr" elements in the resulting vector.
>However, when I enter
>
>n95trt <- length(byyr$cnd95[byyr$cnd95=="tr"])
>n95trt
>
>the result is 68! Any ideas why this is happening, and how I
>can fix the miscount? (That column also contains 69 entries of
>"c", and (relevantly?) two NA's.)
>
>Thanks for any help.
>
>Dave Parkhurst
It is expected.
Since NA represents a true unknown, the two NA's in your vector 'may
be' a "tr". Thus, you get TRUE for the NA's when making the
comparison.
Instead of length(), you might want to use:
sum(byyr$cnd95[byyr$cnd95 == "tr"], na.rm = TRUE)
which will remove the two NA's.
See ?sum
HTH,
Marc Schwartz
More information about the R-help
mailing list