[R] Fwd: rarefy a matrix of counts

Petr Pikal petr.pikal at precheza.cz
Thu Oct 12 13:19:53 CEST 2006


Hi

On 11 Oct 2006 at 12:54, Tony Plate wrote:

Date sent:      	Wed, 11 Oct 2006 12:54:44 -0600
From:           	Tony Plate <tplate at acm.org>
To:             	Brian Frappier <brian.frappier at gmail.com>
Copies to:      	Petr Pikal <petr.pikal at precheza.cz>, r-help at stat.math.ethz.ch
Subject:        	Re: [R] Fwd: rarefy a matrix of counts

> Two things to note:
> 
> (1) rep() can be vectorized:
>  > rep(1:3, 2:4)
> [1] 1 1 2 2 2 3 3 3 3
>  >
> 
> (2) you will likely get much better performance if you work with
> integers and convert to strings after sampling (or use factors), e.g.:

that is what I actually used in my suggestion (I hope).

> DF
  color sample1 sample2 sample3
1   red     400     300    2500
2 green     100       0     200
3 black     300    1000     500

notice that red, green, black is not **row names** but a column in 
data frame.
That is why following code gives red, green, etc.

x <- data.frame(matrix(NA,100,3))
for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
if you want result in data frame
or
x<-vector("list", 3)
for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)

> 
>  > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)]
> [1] "red"  "blue" "red"  "red"  "red"
>  >
> 
> -- Tony Plate
> 

<snip>

> > is that this code still samples the rows, not the elements, i.e.

No, see above.

> > returns 100 or 300 in the matrix cells instead of "red" or a matrix
> > of counts by color (object type) like:
> >        x1    x2   x3  
> > red  32     5    60
> > gr    68    95   40
> > sum 100  100  100

something like

sapply(x,table)
       X1 X2 X3
 black 36 79 15
 green 14  0  9
 red   50 21 76

HTH
Petr

> > 
> >  It looks like Tony is right: sampling without replacement requires 
> > listing of all elements to be sampled.  But, the code Petr provided
> > 
> > x1 <- sample(c(rep("red",400),rep("green",
> > 100),rep("black",300)),100)
> > 
> > did give me a clue of how to quickly make such a list using the
> > 'rep' command.  I will for-loop a rep statement using my original
> > matrix to create a list of elements for each sample:
> > 
> > Thanks Petr and Tony for your help!
> > 
> > On 10/11/06, *Tony Plate* <tplate at acm.org <mailto:tplate at acm.org>>
> > wrote:
> > 
> >     Here's a way using apply(), and the prob= argument of sample():
> > 
> >      > df <- data.frame(sample1=c(red=400,green=100,black=300),
> >     sample2=c(300,0,1000), sample3=c(2500,200,500))
> >      > df
> >            sample1 sample2 sample3
> >     red       400     300    2500
> >     green     100       0     200
> >     black     300    1000     500
> >      > set.seed(1)
> >      > apply(df, 2, function(counts) sample(seq(along=counts),
> >      > rep=T,
> >     size=7, prob=counts))
> >           sample1 sample2 sample3
> >     [1,]       1       3       1
> >     [2,]       1       3       1
> >     [3,]       3       3       1
> >     [4,]       2       3       2
> >     [5,]       1       3       1
> >     [6,]       2       3       1
> >     [7,]       2       3       3
> >      >
> > 
> >     Note that this does sampling WITH replacement.
> >     AFAIK, sampling without replacement requires enumerating the
> >     entire population to be sampled from.  I.e., you cannot do
> >      > sample(1:3, prob=1:3, rep=F, size=4)
> >     instead of
> >      > sample(c(1,2,2,3,3,3), rep=F, size=4)
> > 
> >     -- Tony Plate
> > 
> >      From reading ?sample, I was a little unclear on whether
> >      sampling
> >     without replacement could work
> > 
> >     Petr Pikal wrote:
> >      > Hi
> >      >
> >      > a litle bit different story. But
> >      >
> >      > x1 <- sample(c(rep("red",400),rep("green", 100),
> >      > rep("black",300)),100)
> >      >
> >      > is maybe close. With data frame (if it is not big)
> >      >
> >      >
> >      >>DF
> >      >
> >      >   color sample1 sample2 sample3
> >      > 1   red     400     300    2500
> >      > 2 green     100       0     200
> >      > 3 black     300    1000     500
> >      >
> >      > x <- data.frame(matrix(NA,100,3))
> >      > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1],
> >      > DF[,i]),100) if you want result in data frame or
> >      > x<-vector("list", 3) for (i in 2:ncol(DF)) x[[,i-1]] <-
> >      > sample(rep(DF[,1], DF[,i]),100)
> >      >
> >      > if you want it in list. Maybe somebody is clever enough to
> >      > discard for loop but you said you have 80 columns which shall
> >      > be no problem.
> >      >
> >      > HTH
> >      > Petr
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >
> >      > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> >      >
> >      > Date sent:            Wed, 11 Oct 2006 10:11:33 -0400
> >      > From:                 "Brian Frappier" <
> >      > brian.frappier at gmail.com
> >     <mailto:brian.frappier at gmail.com>>
> >      > To:                   "Petr Pikal" <petr.pikal at precheza.cz
> >     <mailto:petr.pikal at precheza.cz>>
> >      > Subject:              Fwd: [R] rarefy a matrix of counts
> >      >
> >      >
> >      >>---------- Forwarded message ----------
> >      >>From: Brian Frappier <brian.frappier at gmail.com
> >     <mailto:brian.frappier at gmail.com>>
> >      >>Date: Oct 11, 2006 10:10 AM
> >      >>Subject: Re: [R] rarefy a matrix of counts
> >      >>To: r-help at stat.math.ethz.ch
> >      >><mailto:r-help at stat.math.ethz.ch>
> >      >>
> >      >>Hi Petr,
> >      >>
> >      >>Thanks for your response.  I have data that looks like the
> >     following:
> >      >>
> >      >>               sample 1         sample 2         sample 3 
> >      >>               ....
> >      >>red candy        400                 300               2500
> >      >>green candy    100                    0                  200
> >      >>black candy     300                1000                500
> >      >>
> >      >>I don't want to randomly select either the samples (columns)
> >      >>or the "candy" types (rows), which sample as you state would
> >      >>allow me. Instead, I want to randomly sample 100 candies from
> >      >>each sample and retain info on their associated type.  I
> >      >>could make a list of all the candies in each sample:
> >      >>
> >      >>sample 1
> >      >>red
> >      >>red
> >      >>red
> >      >>red
> >      >>green
> >      >>green
> >      >>black
> >      >>red
> >      >>black
> >      >>...
> >      >>
> >      >>and then randomly sample those rows.  Repeat for each
> >     sample.  But, I
> >      >>am not sure how to do that without alot of loops, and am
> >      >>wondering if there is an easier way in R.  Thanks!  I should
> >      >>have laid this out in the first email...sorry.
> >      >>
> >      >>
> >      >>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz
> >     <mailto:petr.pikal at precheza.cz>> wrote:
> >      >>
> >      >>>Hi
> >      >>>
> >      >>>I am not experienced in Matlab and from your explanation I
> >      >>>do not understand what exactly do you want. It seems that
> >      >>>you want randomly choose a sample of 100 rows from your
> >      >>>martix, what can be achived by sample.
> >      >>>
> >      >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300)
> >      >>>DF[sample(1:100, 10),]
> >      >>>
> >      >>>If you want to do this several times, you need to save your
> >      >>>result and than it depends on what you want to do next. One
> >      >>>suitable form is list of matrices the other is array and you
> >      >>>can use for loop for completing it.
> >      >>>
> >      >>>HTH
> >      >>>Petr
> >      >>>
> >      >>>
> >      >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> >      >>>
> >      >>>Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
> >      >>>From:                   "Brian Frappier"
> >     <brian.frappier at gmail.com <mailto:brian.frappier at gmail.com>>
> >      >>>To:                     r-help at stat.math.ethz.ch
> >     <mailto:r-help at stat.math.ethz.ch> Subject:
> >      >>>    [R] rarefy a matrix of counts
> >      >>>
> >      >>>
> >      >>>>Hi all,
> >      >>>>
> >      >>>>I have a matrix of counts for objects (rows) by samples
> >      >>>>(columns).
> >      >>>> I aimed for about 500 counts in each sample (I have about
> >      >>>> 80
> >      >>>>samples) and would now like to rarefy these down to 100
> >      >>>>counts in each sample using simple random sampling without
> >      >>>>replacement.  I plan on rarefying several times for each
> >      >>>>sample.  I could do the tedious looping task of making a
> >      >>>>list of all objects (with its associated identifier) in
> >      >>>>each sample and then use the wonderful "sampling" package
> >      >>>>to select a sub-sample of 100 for each sample and thereby
> >      >>>>get a logical vector of inclusions.  I would then regroup
> >      >>>>the resulting logical vector into a vector of counts by
> >      >>>>object, rinse and repeat several times for each sample.
> >      >>>>
> >      >>>>Alternately, using the same list, I could create a random
> >      >>>>index of integers between 1 and the number of objects for a
> >      >>>>sample (without repeats) and then select those objects from
> >      >>>>the list.  Again, rinse and repeat several time for each
> >      >>>>sample.
> >      >>>>
> >      >>>>Is there a way to directly rarefy a matrix of counts
> >      >>>>without having to create a list of objects first?  I am
> >      >>>>trying to switch to R from Matlab and am trying to pick up
> >      >>>>good programming habits from the start.
> >      >>>>
> >      >>>>Much appreciation!
> >      >>>>
> >      >>>> [[alternative HTML version deleted]]
> >      >>>>
> >      >>>>______________________________________________
> >      >>>>R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> >     mailing list
> >      >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >     <https://stat.ethz.ch/mailman/listinfo/r-help>
> >      >>>>PLEASE do read the posting guide
> >      >>>>http://www.R-project.org/posting-guide.html and provide
> >      >>>>commented, minimal, self-contained, reproducible code.
> >      >>>
> >      >>>Petr Pikal
> >      >>>petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> >      >>>
> >      >>>
> >      >>
> >      >
> >      > Petr Pikal
> >      > petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> >      >
> >      > ______________________________________________
> >      > R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> >     mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible
> >      > code.
> >      >
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list