[R] Selecting rows from a DF where the value in a selected column matches any element of a vector.

Sarah Goslee sarah.goslee at gmail.com
Sat Apr 12 15:04:59 CEST 2014


You need %in% instead.

This is untested, but something like this should work:


ECwork  <-  EC07_A1[ EC07_A1$GEO_ID %in% c("01000US", "04000US06", "33000US488",
"31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") &
      EC07_A1$SECTOR %in% c("32", "33", "42", 44", 45", 51", 54", 61", "71",
"81"), ]

(Note that your original code snippet had a shortage of ) and didn't
specify the data frame from which to take the columns.)

Sarah

On Sat, Apr 12, 2014 at 8:36 AM, Andrew Hoerner <ahoerner at rprogress.org> wrote:
> Dear Folks--
> I have a file with 3 million-odd rows of data from the 2007 U.S. Economic
> Census. I am trying to pare it down to a subset of rows that both (1) has
> any one of a vector of NAICS economic sector codes, and (2) also has any
> one of a vector of geographic ID codes.
>
> Here is the code I am trying to use.
>
> ECwork  <-  EC07_A1[ any(GEO_ID == c("01000US", "04000US06", "33000US488",
> "31000US41860", "31400US4186036084" "05000US06001", "E6000US0600153000") &
>       any(SECTOR == c("32", "33", "42", 44", 45", 51", 54", 61", "71",
> "81"), ]
>
> I get back the following error:
>
> Warning message:
> In EC07_A1$SECTOR == c("32", "33", "42", "44", "45", "51", "54",  :
>   longer object length is not a multiple of shorter object length
>
> I see what R is doing.  Instead of comparing each element of the column
> SECTOR to the row vector of codes, and returning a logical vector of the
> length of SECTOR with rows marked as TRUE that match any of the codes, it
> is lining my code list up with SECTOR as a column vector and doing
> element-by-element testing, and then recycling the code list over three
> million rows. But I am not sure how to make it do what I want -- test the
> sector code in each row against the vector of code I am looking for. I
> would be grateful if anyone could suggest an alternative that would achieve
> my ends.
>
> Oh, and I would add, if there is a way of correctly using doing this with
> the extract function [], I would like to know what it is. If not, I guess
> I'd like to know that too.
>
> Sincerely, Andrew Hoerner
>
> --
> J. Andrew Hoerner
> Director, Sustainable Economics Program
> Redefining Progress
> (510) 507-4820
>
-- 
Sarah Goslee
http://www.functionaldiversity.org




More information about the R-help mailing list