[R] Fwd: rarefy a matrix of counts

Wed Oct 11 22:20:18 CEST 2006

On Wed, 2006-10-11 at 14:25 -0400, Brian Frappier wrote:
> I tried all of the approaches below.
> 
> the problem with:
> 
> > x <- data.frame(matrix(NA,100,3))
> > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > if you want result in data frame
> > or
> > x<-vector("list", 3)
> > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> 
> is that this code still samples the rows, not the elements, i.e. returns 100
> or 300 in the matrix cells instead of "red" or a matrix of counts by color
> (object type) like:
>        x1    x2   x3
> red  32     5    60
> gr    68    95   40
> sum 100  100  100
> 
>  It looks like Tony is right: sampling without replacement requires listing
> of all elements to be sampled.  

<snip>

How about the following approach which generates a new sample using the
rMultinom function from Hmisc.

library(Hmisc)

data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500),
               nrow=3, byrow=TRUE)

col.sums <- apply(data,2,sum)

probs <- t(data)/col.sums

w <- rMultinom(probs,100)

apply(w, 1, table)

Note that I replaced the zero in your example data set with 25 because
the table function doesn't seem to output the results nicely when there
are zero values.

HTH,

Manuel

> On 10/11/06, Tony Plate <tplate at acm.org> wrote:
> >
> > Here's a way using apply(), and the prob= argument of sample():
> >
> > > df <- data.frame(sample1=c(red=400,green=100,black=300),
> > sample2=c(300,0,1000), sample3=c(2500,200,500))
> > > df
> >        sample1 sample2 sample3
> > red       400     300    2500
> > green     100       0     200
> > black     300    1000     500
> > > set.seed(1)
> > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> > size=7, prob=counts))
> >       sample1 sample2 sample3
> > [1,]       1       3       1
> > [2,]       1       3       1
> > [3,]       3       3       1
> > [4,]       2       3       2
> > [5,]       1       3       1
> > [6,]       2       3       1
> > [7,]       2       3       3
> > >
> >
> > Note that this does sampling WITH replacement.
> > AFAIK, sampling without replacement requires enumerating the entire
> > population to be sampled from.  I.e., you cannot do
> > > sample(1:3, prob=1:3, rep=F, size=4)
> > instead of
> > > sample(c(1,2,2,3,3,3), rep=F, size=4)
> >
> > -- Tony Plate
> >
> > From reading ?sample, I was a little unclear on whether sampling
> > without replacement could work
> >
> > Petr Pikal wrote:
> > > Hi
> > >
> > > a litle bit different story. But
> > >
> > > x1 <- sample(c(rep("red",400),rep("green", 100),
> > > rep("black",300)),100)
> > >
> > > is maybe close. With data frame (if it is not big)
> > >
> > >
> > >>DF
> > >
> > >   color sample1 sample2 sample3
> > > 1   red     400     300    2500
> > > 2 green     100       0     200
> > > 3 black     300    1000     500
> > >
> > > x <- data.frame(matrix(NA,100,3))
> > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > > if you want result in data frame
> > > or
> > > x<-vector("list", 3)
> > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> > >
> > > if you want it in list. Maybe somebody is clever enough to discard
> > > for loop but you said you have 80 columns which shall be no problem.
> > >
> > > HTH
> > > Petr
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> > >
> > > Date sent:            Wed, 11 Oct 2006 10:11:33 -0400
> > > From:                 "Brian Frappier" <brian.frappier at gmail.com>
> > > To:                   "Petr Pikal" <petr.pikal at precheza.cz>
> > > Subject:              Fwd: [R] rarefy a matrix of counts
> > >
> > >
> > >>---------- Forwarded message ----------
> > >>From: Brian Frappier <brian.frappier at gmail.com>
> > >>Date: Oct 11, 2006 10:10 AM
> > >>Subject: Re: [R] rarefy a matrix of counts
> > >>To: r-help at stat.math.ethz.ch
> > >>
> > >>Hi Petr,
> > >>
> > >>Thanks for your response.  I have data that looks like the following:
> > >>
> > >>               sample 1         sample 2         sample 3  ....
> > >>red candy        400                 300               2500
> > >>green candy    100                    0                  200
> > >>black candy     300                1000                500
> > >>
> > >>I don't want to randomly select either the samples (columns) or the
> > >>"candy" types (rows), which sample as you state would allow me.
> > >>Instead, I want to randomly sample 100 candies from each sample and
> > >>retain info on their associated type.  I could make a list of all the
> > >>candies in each sample:
> > >>
> > >>sample 1
> > >>red
> > >>red
> > >>red
> > >>red
> > >>green
> > >>green
> > >>black
> > >>red
> > >>black
> > >>...
> > >>
> > >>and then randomly sample those rows.  Repeat for each sample.  But, I
> > >>am not sure how to do that without alot of loops, and am wondering if
> > >>there is an easier way in R.  Thanks!  I should have laid this out in
> > >>the first email...sorry.
> > >>
> > >>
> > >>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
> > >>
> > >>>Hi
> > >>>
> > >>>I am not experienced in Matlab and from your explanation I do not
> > >>>understand what exactly do you want. It seems that you want randomly
> > >>>choose a sample of 100 rows from your martix, what can be achived by
> > >>>sample.
> > >>>
> > >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
> > >>>DF[sample(1:100, 10),]
> > >>>
> > >>>If you want to do this several times, you need to save your result
> > >>>and than it depends on what you want to do next. One suitable form
> > >>>is list of matrices the other is array and you can use for loop for
> > >>>completing it.
> > >>>
> > >>>HTH
> > >>>Petr
> > >>>
> > >>>
> > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> > >>>
> > >>>Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
> > >>>From:                   "Brian Frappier" <brian.frappier at gmail.com>
> > >>>To:                     r-help at stat.math.ethz.ch Subject:
> > >>>    [R] rarefy a matrix of counts
> > >>>
> > >>>
> > >>>>Hi all,
> > >>>>
> > >>>>I have a matrix of counts for objects (rows) by samples (columns).
> > >>>> I aimed for about 500 counts in each sample (I have about 80
> > >>>>samples) and would now like to rarefy these down to 100 counts in
> > >>>>each sample using simple random sampling without replacement.  I
> > >>>>plan on rarefying several times for each sample.  I could do the
> > >>>>tedious looping task of making a list of all objects (with its
> > >>>>associated identifier) in each sample and then use the wonderful
> > >>>>"sampling" package to select a sub-sample of 100 for each sample
> > >>>>and thereby get a logical vector of inclusions.  I would then
> > >>>>regroup the resulting logical vector into a vector of counts by
> > >>>>object, rinse and repeat several times for each sample.
> > >>>>
> > >>>>Alternately, using the same list, I could create a random index of
> > >>>>integers between 1 and the number of objects for a sample (without
> > >>>>repeats) and then select those objects from the list.  Again,
> > >>>>rinse and repeat several time for each sample.
> > >>>>
> > >>>>Is there a way to directly rarefy a matrix of counts without
> > >>>>having to create a list of objects first?  I am trying to switch
> > >>>>to R from Matlab and am trying to pick up good programming habits
> > >>>>from the start.
> > >>>>
> > >>>>Much appreciation!
> > >>>>
> > >>>> [[alternative HTML version deleted]]
> > >>>>
> > >>>>______________________________________________
> > >>>>R-help at stat.math.ethz.ch mailing list
> > >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>PLEASE do read the posting guide
> > >>>>http://www.R-project.org/posting-guide.html and provide commented,
> > >>>>minimal, self-contained, reproducible code.
> > >>>
> > >>>Petr Pikal
> > >>>petr.pikal at precheza.cz
> > >>>
> > >>>
> > >>
> > >
> > > Petr Pikal
> > > petr.pikal at precheza.cz
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Manuel A. Morales
http://mutualism.williams.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20061011/4ae66e9e/attachment.bin