[R] Random sample from a data frame where ID column values don't match the values in an ID column in a second data frame

inkhorn matt.dubins at gmail.com
Fri Mar 30 14:17:01 CEST 2012


Okay, here's some sample code:

ID = c(1,2,3,"A1",5,6,"A2",8,9,"A3")
fakedata = rnorm(10, 5, .5)
main.df = data.frame(ID,fakedata)

results for my data frame:
> main.df
   ID     fakedata
1   1     5.024332
2   2     4.752943
3   3     5.408618
4  A1   5.362838
5   5    5.158660
6   6    4.658235
7  A2   5.389601
8   8    4.998249
9   9    5.248517
10 A3 4.159490

sample1.df = main.df[sample(nrow(main.df), 4), ]
> sample1.df
  ID     fakedata
5  5     5.158660
9  9     5.248517
4 A1   5.362838
8  8    4.998249

Here's what happens when I put a comma before the variable ID:

> sample2.df = main.df[sample(nrow(main.df[! main.df[,"ID"] %in%
> sample1.df[,"ID"]]), 5),]
Error in `[.data.frame`(main.df, !main.df[, "ID"] %in% sample1.df[, "ID"]) : 
  undefined columns selected

Here's what happens when I exclude the comma:

sample2.df = main.df[sample(nrow(main.df[! main.df["ID"] %in%
sample1.df["ID"]]), 5),]
> sample2.df
   ID     fakedata
8   8     4.998249
1   1     5.024332
3   3     5.408618
5   5     5.158660
10 A3  4.159490

As you can see, one way I get nothing other than an error, the other way I
get a sample that doesn't exclude rows that were already included in the 1st
sample.  

Thanks,
Matt Dubins

--
View this message in context: http://r.789695.n4.nabble.com/Random-sample-from-a-data-frame-where-ID-column-values-don-t-match-the-values-in-an-ID-column-in-a-se-tp4516448p4518878.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list