[R] rarefy a matrix of counts

Alex Brown alex at transitive.com
Fri Oct 13 11:56:12 CEST 2006


I thought at first that you could use a weighted sample (the sample  
function) but, you can't since it doesn't take proper account of  
replacement if you try that.

You can use the list approach, but through the power of R, you don't  
need a lot of loops to do it...

I can't speak for the efficiency of this approach in terms of cpu cycle.

In short:

apply(z2,2,function(x)sample(rep(names(x),x),100))

In long:

#let's load the data:

z = scan(,"",sep="\n")
                sample.1         sample.2         sample.3
red.candy       400                 300               2500
green.candy    100                    0                  200
black.candy     300                1000                500

#and turn into a table

  z2 = read.table(textConnection(z), header=TRUE, row.names=1)

# let's create a functon to expand a sample column into individuals:

expand <- function(x) rep(names(x), x)

# test it on a smaller set:

ex <- expand( c( red = 2, blue = 3) )

ex
[1] "red"  "red"  "blue" "blue" "blue"

# and sample 2 things from that:

sample( ex, 2 )

# combine the two

samplex <- function( x, size ) sample(expand(x), size )

samplex( c( red = 2, blue = 3), size = 2 )

# ok, now we use the apply function to apply this to each column

apply(z2, 2, samplex, size = 2 )

# you wanted 100?

apply(z2, 2, samplex, size = 100 )

# all done.

#You should note that if there are less than 100 (samplenumber)  
candies in any given sample, this function will fail.
# eg:

apply(z2, 2, samplex, size = 2000 )

Error in sample(length(x), size, replace, prob) :
	cannot take a sample larger than the population
when 'replace = FALSE'

-Alex

On 11 Oct 2006, at 15:10, Brian Frappier wrote:

> Hi Petr,
>
> Thanks for your response.  I have data that looks like the following:
>
>                sample 1         sample 2         sample 3  ....
> red candy        400                 300               2500
> green candy    100                    0                  200
> black candy     300                1000                500
>
> I don't want to randomly select either the samples (columns) or the  
> "candy"
> types (rows), which sample as you state would allow me.  Instead, I  
> want to
> randomly sample 100 candies from each sample and retain info on their
> associated type.  I could make a list of all the candies in each  
> sample:
>
> sample 1
> red
> red
> red
> red
> green
> green
> black
> red
> black
> ...
>
> and then randomly sample those rows.  Repeat for each sample.  But,  
> I am not
> sure how to do that without alot of loops, and am wondering if  
> there is an
> easier way in R.  Thanks!  I should have laid this out in the first
> email...sorry.
>
>
> On 10/11/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
>>
>> Hi
>>
>> I am not experienced in Matlab and from your explanation I do not
>> understand what exactly do you want. It seems that you want randomly
>> choose a sample of 100 rows from your martix, what can be achived by
>> sample.
>>
>> DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
>> DF[sample(1:100, 10),]
>>
>> If you want to do this several times, you need to save your result
>> and than it depends on what you want to do next. One suitable form is
>> list of matrices the other is array and you can use for loop for
>> completing it.
>>
>> HTH
>> Petr
>>
>>
>> On 10 Oct 2006 at 17:40, Brian Frappier wrote:
>>
>> Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
>> From:                   "Brian Frappier" <brian.frappier at gmail.com>
>> To:                     r-help at stat.math.ethz.ch
>> Subject:                [R] rarefy a matrix of counts
>>
>>> Hi all,
>>>
>>> I have a matrix of counts for objects (rows) by samples  
>>> (columns).  I
>>> aimed for about 500 counts in each sample (I have about 80 samples)
>>> and would now like to rarefy these down to 100 counts in each sample
>>> using simple random sampling without replacement.  I plan on  
>>> rarefying
>>> several times for each sample.  I could do the tedious looping  
>>> task of
>>> making a list of all objects (with its associated identifier) in  
>>> each
>>> sample and then use the wonderful "sampling" package to select a
>>> sub-sample of 100 for each sample and thereby get a logical  
>>> vector of
>>> inclusions.  I would then regroup the resulting logical vector  
>>> into a
>>> vector of counts by object, rinse and repeat several times for each
>>> sample.
>>>
>>> Alternately, using the same list, I could create a random index of
>>> integers between 1 and the number of objects for a sample (without
>>> repeats) and then select those objects from the list.  Again, rinse
>>> and repeat several time for each sample.
>>>
>>> Is there a way to directly rarefy a matrix of counts without  
>>> having to
>>> create a list of objects first?  I am trying to switch to R from
>>> Matlab and am trying to pick up good programming habits from the
>>> start.
>>>
>>> Much appreciation!
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html and provide commented,
>>> minimal, self-contained, reproducible code.
>>
>> Petr Pikal
>> petr.pikal at precheza.cz
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list