[R] Random sample from a data frame where ID column values don't match the values in an ID column in a second data frame
inkhorn
matt.dubins at gmail.com
Fri Mar 30 14:17:01 CEST 2012
Okay, here's some sample code:
ID = c(1,2,3,"A1",5,6,"A2",8,9,"A3")
fakedata = rnorm(10, 5, .5)
main.df = data.frame(ID,fakedata)
results for my data frame:
> main.df
ID fakedata
1 1 5.024332
2 2 4.752943
3 3 5.408618
4 A1 5.362838
5 5 5.158660
6 6 4.658235
7 A2 5.389601
8 8 4.998249
9 9 5.248517
10 A3 4.159490
sample1.df = main.df[sample(nrow(main.df), 4), ]
> sample1.df
ID fakedata
5 5 5.158660
9 9 5.248517
4 A1 5.362838
8 8 4.998249
Here's what happens when I put a comma before the variable ID:
> sample2.df = main.df[sample(nrow(main.df[! main.df[,"ID"] %in%
> sample1.df[,"ID"]]), 5),]
Error in `[.data.frame`(main.df, !main.df[, "ID"] %in% sample1.df[, "ID"]) :
undefined columns selected
Here's what happens when I exclude the comma:
sample2.df = main.df[sample(nrow(main.df[! main.df["ID"] %in%
sample1.df["ID"]]), 5),]
> sample2.df
ID fakedata
8 8 4.998249
1 1 5.024332
3 3 5.408618
5 5 5.158660
10 A3 4.159490
As you can see, one way I get nothing other than an error, the other way I
get a sample that doesn't exclude rows that were already included in the 1st
sample.
Thanks,
Matt Dubins
--
View this message in context: http://r.789695.n4.nabble.com/Random-sample-from-a-data-frame-where-ID-column-values-don-t-match-the-values-in-an-ID-column-in-a-se-tp4516448p4518878.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list