[R] Fwd: rarefy a matrix of counts

Tony Plate tplate at acm.org
Wed Oct 11 17:12:48 CEST 2006


Here's a way using apply(), and the prob= argument of sample():

 > df <- data.frame(sample1=c(red=400,green=100,black=300), 
sample2=c(300,0,1000), sample3=c(2500,200,500))
 > df
       sample1 sample2 sample3
red       400     300    2500
green     100       0     200
black     300    1000     500
 > set.seed(1)
 > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, 
size=7, prob=counts))
      sample1 sample2 sample3
[1,]       1       3       1
[2,]       1       3       1
[3,]       3       3       1
[4,]       2       3       2
[5,]       1       3       1
[6,]       2       3       1
[7,]       2       3       3
 >

Note that this does sampling WITH replacement.
AFAIK, sampling without replacement requires enumerating the entire 
population to be sampled from.  I.e., you cannot do
 > sample(1:3, prob=1:3, rep=F, size=4)
instead of
 > sample(c(1,2,2,3,3,3), rep=F, size=4)

-- Tony Plate

 From reading ?sample, I was a little unclear on whether sampling 
without replacement could work

Petr Pikal wrote:
> Hi
> 
> a litle bit different story. But
> 
> x1 <- sample(c(rep("red",400),rep("green", 100), 
> rep("black",300)),100)
> 
> is maybe close. With data frame (if it is not big)
> 
> 
>>DF
> 
>   color sample1 sample2 sample3
> 1   red     400     300    2500
> 2 green     100       0     200
> 3 black     300    1000     500
> 
> x <- data.frame(matrix(NA,100,3))
> for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> if you want result in data frame
> or
> x<-vector("list", 3)
> for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> 
> if you want it in list. Maybe somebody is clever enough to discard 
> for loop but you said you have 80 columns which shall be no problem.
> 
> HTH
> Petr
> 
> 
> 
> 
> 
> 
> 
> On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> 
> Date sent:      	Wed, 11 Oct 2006 10:11:33 -0400
> From:           	"Brian Frappier" <brian.frappier at gmail.com>
> To:             	"Petr Pikal" <petr.pikal at precheza.cz>
> Subject:        	Fwd: [R] rarefy a matrix of counts
> 
> 
>>---------- Forwarded message ----------
>>From: Brian Frappier <brian.frappier at gmail.com>
>>Date: Oct 11, 2006 10:10 AM
>>Subject: Re: [R] rarefy a matrix of counts
>>To: r-help at stat.math.ethz.ch
>>
>>Hi Petr,
>>
>>Thanks for your response.  I have data that looks like the following:
>>
>>               sample 1         sample 2         sample 3  ....
>>red candy        400                 300               2500
>>green candy    100                    0                  200
>>black candy     300                1000                500
>>
>>I don't want to randomly select either the samples (columns) or the
>>"candy" types (rows), which sample as you state would allow me. 
>>Instead, I want to randomly sample 100 candies from each sample and
>>retain info on their associated type.  I could make a list of all the
>>candies in each sample:
>>
>>sample 1
>>red
>>red
>>red
>>red
>>green
>>green
>>black
>>red
>>black
>>...
>>
>>and then randomly sample those rows.  Repeat for each sample.  But, I
>>am not sure how to do that without alot of loops, and am wondering if
>>there is an easier way in R.  Thanks!  I should have laid this out in
>>the first email...sorry.
>>
>>
>>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
>>
>>>Hi
>>>
>>>I am not experienced in Matlab and from your explanation I do not
>>>understand what exactly do you want. It seems that you want randomly
>>>choose a sample of 100 rows from your martix, what can be achived by
>>>sample.
>>>
>>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
>>>DF[sample(1:100, 10),]
>>>
>>>If you want to do this several times, you need to save your result
>>>and than it depends on what you want to do next. One suitable form
>>>is list of matrices the other is array and you can use for loop for
>>>completing it.
>>>
>>>HTH
>>>Petr
>>>
>>>
>>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
>>>
>>>Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
>>>From:                   "Brian Frappier" <brian.frappier at gmail.com>
>>>To:                     r-help at stat.math.ethz.ch Subject:           
>>>    [R] rarefy a matrix of counts
>>>
>>>
>>>>Hi all,
>>>>
>>>>I have a matrix of counts for objects (rows) by samples (columns).
>>>> I aimed for about 500 counts in each sample (I have about 80
>>>>samples) and would now like to rarefy these down to 100 counts in
>>>>each sample using simple random sampling without replacement.  I
>>>>plan on rarefying several times for each sample.  I could do the
>>>>tedious looping task of making a list of all objects (with its
>>>>associated identifier) in each sample and then use the wonderful
>>>>"sampling" package to select a sub-sample of 100 for each sample
>>>>and thereby get a logical vector of inclusions.  I would then
>>>>regroup the resulting logical vector into a vector of counts by
>>>>object, rinse and repeat several times for each sample.
>>>>
>>>>Alternately, using the same list, I could create a random index of
>>>>integers between 1 and the number of objects for a sample (without
>>>>repeats) and then select those objects from the list.  Again,
>>>>rinse and repeat several time for each sample.
>>>>
>>>>Is there a way to directly rarefy a matrix of counts without
>>>>having to create a list of objects first?  I am trying to switch
>>>>to R from Matlab and am trying to pick up good programming habits
>>>>from the start.
>>>>
>>>>Much appreciation!
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at stat.math.ethz.ch mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html and provide commented,
>>>>minimal, self-contained, reproducible code.
>>>
>>>Petr Pikal
>>>petr.pikal at precheza.cz
>>>
>>>
>>
> 
> Petr Pikal
> petr.pikal at precheza.cz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list