[R] how to randomly select the samples with different probabilities for different classes?

Rui Barradas ruipbarradas at sapo.pt
Wed Dec 7 14:23:10 CET 2016


Hello,

If 60% of the 14 samples come from group C, then 8.4 samples should come 
from a group with 6 elements. Do you want sampling with replacement? If 
so maybe the following will do.


perc <- c(0.4, 0.6)
tmp <- split(seq_len(nrow(dat)), dat$group == "C")
idx <- sapply(seq_along(tmp), function(i) sample(length(tmp[[i]]), 
round(perc[i]*14), replace = TRUE))
idx[[2]] <- idx[[2]] + 16
idx <- unlist(idx)
dat[idx, ]

Hope this helps,

Rui Barradas

Em 07-12-2016 11:58, Marna Wagley escreveu:
> Hi R user,
> I have samples with covariates for different classes, I wanted to choose
> the samples of different groups with different probabilities. For example,
> I have a 22 samples size with 3 classes,
> groupA has 8 samples
> groupB has 8 samples
> groupC has 6 samples
>
> I want to select a total 14 samples from 22 samples, in which  40% of the
> 14 samples should be in groups A and B, 60% of the 14 samples should be in
> the group C.
>
> Would you mind to help me on how I can select the samples with that
> conditions? I have attached a sample data
>
> dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L,
> 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L,
> 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
> "B", "C"), class = "factor")), .Names = c("sampleID", "group"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> thanks,
>    MW
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list