[R] Fwd: rarefy a matrix of counts

Tony Plate tplate at acm.org
Wed Oct 11 20:54:44 CEST 2006


Two things to note:

(1) rep() can be vectorized:
 > rep(1:3, 2:4)
[1] 1 1 2 2 2 3 3 3 3
 >

(2) you will likely get much better performance if you work with 
integers and convert to strings after sampling (or use factors), e.g.:

 > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)]
[1] "red"  "blue" "red"  "red"  "red"
 >

-- Tony Plate

Brian Frappier wrote:
> I tried all of the approaches below. 
> 
> the problem with:
> 
>  > x <- data.frame(matrix(NA,100,3))
>  > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
>  > if you want result in data frame
>  > or
>  > x<-vector("list", 3)
>  > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> 
> is that this code still samples the rows, not the elements, i.e. returns 
> 100 or 300 in the matrix cells instead of "red" or a matrix of counts by 
> color (object type) like:
>        x1    x2   x3  
> red  32     5    60
> gr    68    95   40
> sum 100  100  100
> 
>  It looks like Tony is right: sampling without replacement requires 
> listing of all elements to be sampled.  But, the code Petr provided
> 
> x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100)
> 
> did give me a clue of how to quickly make such a list using the 'rep' 
> command.  I will for-loop a rep statement using my original matrix to 
> create a list of elements for each sample:
> 
> Thanks Petr and Tony for your help!
> 
> On 10/11/06, *Tony Plate* <tplate at acm.org <mailto:tplate at acm.org>> wrote:
> 
>     Here's a way using apply(), and the prob= argument of sample():
> 
>      > df <- data.frame(sample1=c(red=400,green=100,black=300),
>     sample2=c(300,0,1000), sample3=c(2500,200,500))
>      > df
>            sample1 sample2 sample3
>     red       400     300    2500
>     green     100       0     200
>     black     300    1000     500
>      > set.seed(1)
>      > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
>     size=7, prob=counts))
>           sample1 sample2 sample3
>     [1,]       1       3       1
>     [2,]       1       3       1
>     [3,]       3       3       1
>     [4,]       2       3       2
>     [5,]       1       3       1
>     [6,]       2       3       1
>     [7,]       2       3       3
>      >
> 
>     Note that this does sampling WITH replacement.
>     AFAIK, sampling without replacement requires enumerating the entire
>     population to be sampled from.  I.e., you cannot do
>      > sample(1:3, prob=1:3, rep=F, size=4)
>     instead of
>      > sample(c(1,2,2,3,3,3), rep=F, size=4)
> 
>     -- Tony Plate
> 
>      From reading ?sample, I was a little unclear on whether sampling
>     without replacement could work
> 
>     Petr Pikal wrote:
>      > Hi
>      >
>      > a litle bit different story. But
>      >
>      > x1 <- sample(c(rep("red",400),rep("green", 100),
>      > rep("black",300)),100)
>      >
>      > is maybe close. With data frame (if it is not big)
>      >
>      >
>      >>DF
>      >
>      >   color sample1 sample2 sample3
>      > 1   red     400     300    2500
>      > 2 green     100       0     200
>      > 3 black     300    1000     500
>      >
>      > x <- data.frame(matrix(NA,100,3))
>      > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
>      > if you want result in data frame
>      > or
>      > x<-vector("list", 3)
>      > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
>      >
>      > if you want it in list. Maybe somebody is clever enough to discard
>      > for loop but you said you have 80 columns which shall be no problem.
>      >
>      > HTH
>      > Petr
>      >
>      >
>      >
>      >
>      >
>      >
>      >
>      > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
>      >
>      > Date sent:            Wed, 11 Oct 2006 10:11:33 -0400
>      > From:                 "Brian Frappier" < brian.frappier at gmail.com
>     <mailto:brian.frappier at gmail.com>>
>      > To:                   "Petr Pikal" <petr.pikal at precheza.cz
>     <mailto:petr.pikal at precheza.cz>>
>      > Subject:              Fwd: [R] rarefy a matrix of counts
>      >
>      >
>      >>---------- Forwarded message ----------
>      >>From: Brian Frappier <brian.frappier at gmail.com
>     <mailto:brian.frappier at gmail.com>>
>      >>Date: Oct 11, 2006 10:10 AM
>      >>Subject: Re: [R] rarefy a matrix of counts
>      >>To: r-help at stat.math.ethz.ch <mailto:r-help at stat.math.ethz.ch>
>      >>
>      >>Hi Petr,
>      >>
>      >>Thanks for your response.  I have data that looks like the
>     following:
>      >>
>      >>               sample 1         sample 2         sample 3  ....
>      >>red candy        400                 300               2500
>      >>green candy    100                    0                  200
>      >>black candy     300                1000                500
>      >>
>      >>I don't want to randomly select either the samples (columns) or the
>      >>"candy" types (rows), which sample as you state would allow me.
>      >>Instead, I want to randomly sample 100 candies from each sample and
>      >>retain info on their associated type.  I could make a list of all the
>      >>candies in each sample:
>      >>
>      >>sample 1
>      >>red
>      >>red
>      >>red
>      >>red
>      >>green
>      >>green
>      >>black
>      >>red
>      >>black
>      >>...
>      >>
>      >>and then randomly sample those rows.  Repeat for each
>     sample.  But, I
>      >>am not sure how to do that without alot of loops, and am wondering if
>      >>there is an easier way in R.  Thanks!  I should have laid this out in
>      >>the first email...sorry.
>      >>
>      >>
>      >>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz
>     <mailto:petr.pikal at precheza.cz>> wrote:
>      >>
>      >>>Hi
>      >>>
>      >>>I am not experienced in Matlab and from your explanation I do not
>      >>>understand what exactly do you want. It seems that you want randomly
>      >>>choose a sample of 100 rows from your martix, what can be achived by
>      >>>sample.
>      >>>
>      >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300)
>      >>>DF[sample(1:100, 10),]
>      >>>
>      >>>If you want to do this several times, you need to save your result
>      >>>and than it depends on what you want to do next. One suitable form
>      >>>is list of matrices the other is array and you can use for loop for
>      >>>completing it.
>      >>>
>      >>>HTH
>      >>>Petr
>      >>>
>      >>>
>      >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
>      >>>
>      >>>Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
>      >>>From:                   "Brian Frappier"
>     <brian.frappier at gmail.com <mailto:brian.frappier at gmail.com>>
>      >>>To:                     r-help at stat.math.ethz.ch
>     <mailto:r-help at stat.math.ethz.ch> Subject:
>      >>>    [R] rarefy a matrix of counts
>      >>>
>      >>>
>      >>>>Hi all,
>      >>>>
>      >>>>I have a matrix of counts for objects (rows) by samples (columns).
>      >>>> I aimed for about 500 counts in each sample (I have about 80
>      >>>>samples) and would now like to rarefy these down to 100 counts in
>      >>>>each sample using simple random sampling without replacement.  I
>      >>>>plan on rarefying several times for each sample.  I could do the
>      >>>>tedious looping task of making a list of all objects (with its
>      >>>>associated identifier) in each sample and then use the wonderful
>      >>>>"sampling" package to select a sub-sample of 100 for each sample
>      >>>>and thereby get a logical vector of inclusions.  I would then
>      >>>>regroup the resulting logical vector into a vector of counts by
>      >>>>object, rinse and repeat several times for each sample.
>      >>>>
>      >>>>Alternately, using the same list, I could create a random index of
>      >>>>integers between 1 and the number of objects for a sample (without
>      >>>>repeats) and then select those objects from the list.  Again,
>      >>>>rinse and repeat several time for each sample.
>      >>>>
>      >>>>Is there a way to directly rarefy a matrix of counts without
>      >>>>having to create a list of objects first?  I am trying to switch
>      >>>>to R from Matlab and am trying to pick up good programming habits
>      >>>>from the start.
>      >>>>
>      >>>>Much appreciation!
>      >>>>
>      >>>> [[alternative HTML version deleted]]
>      >>>>
>      >>>>______________________________________________
>      >>>>R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
>     mailing list
>      >>>>https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      >>>>PLEASE do read the posting guide
>      >>>>http://www.R-project.org/posting-guide.html and provide commented,
>      >>>>minimal, self-contained, reproducible code.
>      >>>
>      >>>Petr Pikal
>      >>>petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
>      >>>
>      >>>
>      >>
>      >
>      > Petr Pikal
>      > petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
>      >
>      > ______________________________________________
>      > R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
>     mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>      >
> 
>



More information about the R-help mailing list