[R] subset data.frame with value != in all columns

Marc Schwartz MSchwartz at MedAnalytics.com
Thu Feb 3 21:24:18 CET 2005


On Thu, 2005-02-03 at 14:57 -0500, Tim Howard wrote: 
> I am trying to extract rows from a data.frame based on the
> presence/absence of a single value in any column.  I've figured out how
> to do get the positive matches, but the remainder (rows without this
> value) eludes me.  Mining the help pages and archives brought me,
> frustratingly,  very close, as you'll see below. 
> 
> My goal: two data frames, one with -99 in at least one column in each
> row, one with no occurrences of -99. I want to preserve rownames in
> each.
> 
> My questions: 
> Is there a cleaner way to extract all rows containing a specified
> value?
> How can I extract all rows that don't have this value in any col?
> 
> #create dummy dataset
> x <- data.frame(
> c1=c(-99,-99,-99,4:10),
> c2=1:10,
> c3=c(1:3,-99,5:10),
> c4=c(10:1),
> c5=c(1:9,-99))
> 
> #extract data.frame of rows with -99 in them
> for(i in 1:ncol(x))
> {
> y<-subset(x, x[,i]==-99, drop=FALSE);
> ifelse(i==1, z<-y, z <- rbind(z,y));
> }
> 
> #various attempts to get rows not containing "-99":
> 
> # this attempt was to create, in "list", the exclusion formula for each
> column.
> # Here, I couldn't get subset to recognize "list" as the correct type.
> # e.g. it works if I paste the value of list in the subset command
> {
> for(i in 1:ncol(x)){
> if(i==1)
> list<-paste("x[",i,"]!=-99", sep="")
> else
> list<-paste(list," ", " & x[",i,"]!=-99", sep="")
> }
> y<-subset(x, list, drop=FALSE);
> }
> 
> # this will do it for one col, but if I index more
> # it returns all rows
> y <- x[!(x[,3] %in% -99),]
> 
> # this also works for one col
> y<-x[x[,1]!=-99,]
> 
> # but if I index more, I get extra rows of NAs
> y<-x[x[,1:5]!=-99,]
> 
> Thanks in advance.
> Tim Howard


How about this, presuming that your data frame is all numeric:

For rows containing -99:

> x[unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
    c1 c2  c3 c4  c5
1  -99  1   1 10   1
2  -99  2   2  9   2
3  -99  3   3  8   3
4    4  4 -99  7   4
10  10 10  10  1 -99


For rows not containing -99:

> x[-unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
  c1 c2 c3 c4 c5
5  5  5  5  6  5
6  6  6  6  5  6
7  7  7  7  4  7
8  8  8  8  3  8
9  9  9  9  2  9


What I have done here is to use which(), setting arr.ind = TRUE. This
returns the row, column indices for the matches to the boolean
statement. The first column returned by which() in this case are the row
numbers matching the statement, so I take the first column only.

Since it is possible that more than one element in a row can match the
boolean, I then use unique() to get the singular row values.

Thus, I can use the returned row indices above to subset the data frame.

HTH,

Marc Schwartz




More information about the R-help mailing list