[R] Subsetting on multiple criteria (AND condition) in R
Marc Schwartz
marc_schwartz at me.com
Tue Jan 14 22:05:19 CET 2014
On Jan 14, 2014, at 1:38 PM, Jeff Johnson <mrjefftoyou at gmail.com> wrote:
> I'm running the following to get what I would expect is a subset of
> countries that are not equal to "US" AND COUNTRY is not in one of my
> validcountries values.
>
> non_us <- subset(mydf, (COUNTRY %in% validcountries) & COUNTRY != "US",
> select = COUNTRY, na.rm=TRUE)
>
> however, when I then do table(non_us) I get:
>> table(non_us)
> non_us
> AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
> EC ES
> 0 3 0 2 1 31 4 1 1 1 45 1 1 4 5 86 3 1 8 1 2 1 8 2 1
> 2 4
> FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
> NZ PA
> 2 4 35 3 3 14 3 5 2 5 1 2 1 15 1 11 2 2 1 1 23 7 1 6 1
> 3 1
> PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
> 2 1 1 8 1 1 1 1 1 18 1 1 2 11 1 0 3
>>
>
> Notice US appears as the second to last. I expected it to NOT appear.
>
> Do you know if I'm using incorrect syntax? Is the & symbol equivalent to
> AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != "US"
> valid syntax? I don't get errors, but then again I don't get what I expect
> back.
>
> Thanks in advance!
>
>
>
> --
> Jeff
Review the Details section of ?subset, where you will find the following:
"Factors may have empty levels after subsetting; unused levels are not automatically removed. See droplevels for a way to drop all unused levels from a data frame."
Your syntax is fine and the behavior is as expected.
Regards,
Marc Schwartz
More information about the R-help
mailing list