[R] sampling dataframe based upon number of record occurrences
Curtis Burkhalter
curtisburkhalter at gmail.com
Tue Mar 3 22:22:33 CET 2015
Hello everyone,
I'm having trouble performing a task that is probably very simple, but
can't seem to figure out how to get my code to work. What I want to do is
use the sample function to pick records within in a dataframe, but only if
a column attribute value is repeated more than 3 times. So if you look at
the data below I have created a unique attribute value that corresponds to
every site by year combination (i.e. IDxYear). So you can see that for the
site called "A-Airport" it was sampled 6 times in 2006, "A-Bank Corral
East" was sampled twice in 2008. So what I want to do is randomly select 3
records for "A-Airport" in 2006 for the existing 6 records, but for "A-Bark
Corral East" in 2008 I just want to leave these records as they currently
are.
I've used the following code to try and accomplish this, but like I said I
can't get it to work so I'm clearly doing something wrong. If you could
check out the code and provide any suggestions that would be great. It
should be noted that there are 5589 unique IDxYear combinations so that's
why that number is in the code. If any further clarification is needed also
let me know.
boom=data.frame()
for (i in 1:5589){
boom[i,]=ifelse(length(fitting_set$IDbyYear[i]>3),fitting_set[sample(nrow(fitting_set),3),],fitting_set)
}
boom
*IDbyYear* *SiteID * *Year*
*6 other column attributes*
42.24 A-Airport 2006
42.24 A-Airport 2006
42.24 A-Airport 2006
42.24 A-Airport 2006
42.24 A-Airport 2006
42.24 A-Airport 2006
45.32 A-Bark Corral East 2008
45.32 A-Bark Corral East 2008
45.36 A-Bark Corral East 2009
45.40 A-Bark Corral East 2010
45.40 A-Bark Corral East 2010
Thanks
--
Curtis Burkhalter
https://sites.google.com/site/curtisburkhalter/
[[alternative HTML version deleted]]
More information about the R-help
mailing list