[R] random sample from arrays

David Winsemius dwinsemius at comcast.net
Thu Jul 8 15:46:04 CEST 2010


On Jul 8, 2010, at 2:04 AM, Assa Yeroslaviz wrote:

> Hello R users,
>
> I'm trying to extract random samples from a big array I have.
>
> I have a data frame of over 40k lines and would like to produce  
> around 50
> random sample of around 200 lines each from this array.
>
> this is the matrix
>          ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T  
> xxx__8T
> yyy_1c yyy_1c _2c
> 1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
> 2.115634  2.189372  1.643328  1.563523
> 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
> 11.650336 11.995504 13.024494 12.776322
> 3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
> 2.033982  2.111937  1.606340  1.548940
> 4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
> 9.160038  9.104121 10.079177  9.828757
> 5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
> 3.559434  3.156093  4.308437  4.045098
> 6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
> 7.410085  7.022582  7.668616  7.953706
> ...
>
> I tried to do it with a for loop:
>
> genelist <- read.delim("/user/R/raw_data.txt")
> rownames(genelist) <- genelist[,1]
> genes <- rownames(genelist)
>

One method:

totsize  <- 50 * 200
$ create matrix of indices
smatrix <- matrix(sample( 1:length(genelist$ID), totsize), nrow=200,  
ncol=50)

# Then any one sample would be:

  genelist[ smatrix[,i], ] for i in 1:50.

You do need to decide whether this approach which creates 50 mutually  
exclusive samples (if the ID's are unique) is really what you want,  
since they are not truly independent draws. I think this could be an  
issue with a ratio of universe:sample ~ 4:1. It's not a bootstrap  
sample. Could add replace=TRUE in the sample call to fix that.


-- 
David

> x <- 1:40000
> set <- matrix(nrow = 50, ncol = 11)
>
> for(i in c(1:50)){
>    set[i] <-sample(x,50)
>    print(c(i,"->", set), quote = FALSE)
>    }
>
> which basically do the trick, but I just can't save the results  
> outside the
> loop.
> After having the random sets of lines it wasn't a problem to extract  
> the
> line from the arrays using subset.
>
> genSet1 <-sample(x,50)
> random1 <- genes %in% genSet1
> subsetGenelist <- subset(genelist, random1)
>
>
> is there a different way of creating these random vectors or saving  
> the loop
> results outside tjhe loop so I cn work with them?
>
> Thanks a lot
>
> Assa
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list