[R] random sampling but with caveats!
jcbouette at gmail.com
Fri Sep 9 02:43:22 CEST 2011
It seems you got no answer. Maybe providing a reproducible example
would help, as well as expressing your problem in more general terms.
I am not an expert in sampling, but I would suggest (as does the help
for sample) that you take a look at the sampling package, available on
CRAN, and the strata function in this package that allows for
2011/9/8 Rebecca Ross <rebecca.ross at plymouth.ac.uk>:
> I wonder if someone can help me. I have built a gam model to predict the presence of cold water corals and am now trying to evaluate my model by splitting my dataset into training/test datasets.
> In an ideal world I would use the sample() function to randomly select rows of data for me so for example with 936 rows of data in my HH dataset I might say
> ss <- sample(nrow(HH), size = nrow(HH)-312, replace = FALSE)
> in order to create a random training sub-sample of roughly 65% of my data and test of 35%. (I would use a for() loop to automate the process of building the datasets and running the prediction e.g.1000times)
> The problem is that I do have 2 caveats for the subsampling:
> a) I need to have control over the prevalence (proportion of observed presences within the dataset) in my build and test datasets
> I realise I could do this by sorting my column of presences and absences and then taking a subsample of the required size from the rows containing presences then the rows containing absences and combining them.
> e.g. presence_records<-sample(1:117,size=75,replace=FALSE)
> b) My samples are within video transects and due to the risk of autocorrelation within each transect, ideally it is by transect cluster that they will be randomly selected. (a point within a transect cannot be allocated to the training dataset when another point from that same transect is already allocated to the test dataset)
> Is there a way I can fulfil both of these caveats and come out with my (slightly less)random subsamples?
> Many thanks for your time!
> All the best,
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help