[R] Sampling with Constraints for testing and training data
Petr Savicky
savicky at cs.cas.cz
Wed Jan 25 16:17:48 CET 2012
On Wed, Jan 25, 2012 at 04:00:27AM -0800, Eliano wrote:
> Hi People,
>
> Does anyone have a good solution for this problem:
>
> a database called DB.
>
>
> index <- sample(1:nrow(DB), size=0.2*nrow(BD))
> test <- DB[index,]
> train <- DB[-index,]
>
> One of the variables in this database contais a target variable with two
> values 0 and 1.
>
> Imagine now that i want to constraint the test data frame so the 20% of the
> size of "test" has 50% of DB$target.
>
> Imagine: n=100
> DB$target = { 0=80
> 1=20}
>
> test=20 and contain 10 random values of DB$target=1 and 10 random values of
> DB$target=0.
Hi.
One way is as follows.
t0 <- which(DB$target==0)
t1 <- which(DB$target==1)
m <- round(0.1*nrow(DB))
stopifnot(length(t0) >= m & length(t1) >= m)
index <- c(sample(t0, size=m), sample(t1, size=m))
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list