[R] Fwd: rarefy a matrix of counts
Petr Pikal
petr.pikal at precheza.cz
Thu Oct 12 13:19:53 CEST 2006
Hi
On 11 Oct 2006 at 12:54, Tony Plate wrote:
Date sent: Wed, 11 Oct 2006 12:54:44 -0600
From: Tony Plate <tplate at acm.org>
To: Brian Frappier <brian.frappier at gmail.com>
Copies to: Petr Pikal <petr.pikal at precheza.cz>, r-help at stat.math.ethz.ch
Subject: Re: [R] Fwd: rarefy a matrix of counts
> Two things to note:
>
> (1) rep() can be vectorized:
> > rep(1:3, 2:4)
> [1] 1 1 2 2 2 3 3 3 3
> >
>
> (2) you will likely get much better performance if you work with
> integers and convert to strings after sampling (or use factors), e.g.:
that is what I actually used in my suggestion (I hope).
> DF
color sample1 sample2 sample3
1 red 400 300 2500
2 green 100 0 200
3 black 300 1000 500
notice that red, green, black is not **row names** but a column in
data frame.
That is why following code gives red, green, etc.
x <- data.frame(matrix(NA,100,3))
for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
if you want result in data frame
or
x<-vector("list", 3)
for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
>
> > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)]
> [1] "red" "blue" "red" "red" "red"
> >
>
> -- Tony Plate
>
<snip>
> > is that this code still samples the rows, not the elements, i.e.
No, see above.
> > returns 100 or 300 in the matrix cells instead of "red" or a matrix
> > of counts by color (object type) like:
> > x1 x2 x3
> > red 32 5 60
> > gr 68 95 40
> > sum 100 100 100
something like
sapply(x,table)
X1 X2 X3
black 36 79 15
green 14 0 9
red 50 21 76
HTH
Petr
> >
> > It looks like Tony is right: sampling without replacement requires
> > listing of all elements to be sampled. But, the code Petr provided
> >
> > x1 <- sample(c(rep("red",400),rep("green",
> > 100),rep("black",300)),100)
> >
> > did give me a clue of how to quickly make such a list using the
> > 'rep' command. I will for-loop a rep statement using my original
> > matrix to create a list of elements for each sample:
> >
> > Thanks Petr and Tony for your help!
> >
> > On 10/11/06, *Tony Plate* <tplate at acm.org <mailto:tplate at acm.org>>
> > wrote:
> >
> > Here's a way using apply(), and the prob= argument of sample():
> >
> > > df <- data.frame(sample1=c(red=400,green=100,black=300),
> > sample2=c(300,0,1000), sample3=c(2500,200,500))
> > > df
> > sample1 sample2 sample3
> > red 400 300 2500
> > green 100 0 200
> > black 300 1000 500
> > > set.seed(1)
> > > apply(df, 2, function(counts) sample(seq(along=counts),
> > > rep=T,
> > size=7, prob=counts))
> > sample1 sample2 sample3
> > [1,] 1 3 1
> > [2,] 1 3 1
> > [3,] 3 3 1
> > [4,] 2 3 2
> > [5,] 1 3 1
> > [6,] 2 3 1
> > [7,] 2 3 3
> > >
> >
> > Note that this does sampling WITH replacement.
> > AFAIK, sampling without replacement requires enumerating the
> > entire population to be sampled from. I.e., you cannot do
> > > sample(1:3, prob=1:3, rep=F, size=4)
> > instead of
> > > sample(c(1,2,2,3,3,3), rep=F, size=4)
> >
> > -- Tony Plate
> >
> > From reading ?sample, I was a little unclear on whether
> > sampling
> > without replacement could work
> >
> > Petr Pikal wrote:
> > > Hi
> > >
> > > a litle bit different story. But
> > >
> > > x1 <- sample(c(rep("red",400),rep("green", 100),
> > > rep("black",300)),100)
> > >
> > > is maybe close. With data frame (if it is not big)
> > >
> > >
> > >>DF
> > >
> > > color sample1 sample2 sample3
> > > 1 red 400 300 2500
> > > 2 green 100 0 200
> > > 3 black 300 1000 500
> > >
> > > x <- data.frame(matrix(NA,100,3))
> > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1],
> > > DF[,i]),100) if you want result in data frame or
> > > x<-vector("list", 3) for (i in 2:ncol(DF)) x[[,i-1]] <-
> > > sample(rep(DF[,1], DF[,i]),100)
> > >
> > > if you want it in list. Maybe somebody is clever enough to
> > > discard for loop but you said you have 80 columns which shall
> > > be no problem.
> > >
> > > HTH
> > > Petr
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> > >
> > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400
> > > From: "Brian Frappier" <
> > > brian.frappier at gmail.com
> > <mailto:brian.frappier at gmail.com>>
> > > To: "Petr Pikal" <petr.pikal at precheza.cz
> > <mailto:petr.pikal at precheza.cz>>
> > > Subject: Fwd: [R] rarefy a matrix of counts
> > >
> > >
> > >>---------- Forwarded message ----------
> > >>From: Brian Frappier <brian.frappier at gmail.com
> > <mailto:brian.frappier at gmail.com>>
> > >>Date: Oct 11, 2006 10:10 AM
> > >>Subject: Re: [R] rarefy a matrix of counts
> > >>To: r-help at stat.math.ethz.ch
> > >><mailto:r-help at stat.math.ethz.ch>
> > >>
> > >>Hi Petr,
> > >>
> > >>Thanks for your response. I have data that looks like the
> > following:
> > >>
> > >> sample 1 sample 2 sample 3
> > >> ....
> > >>red candy 400 300 2500
> > >>green candy 100 0 200
> > >>black candy 300 1000 500
> > >>
> > >>I don't want to randomly select either the samples (columns)
> > >>or the "candy" types (rows), which sample as you state would
> > >>allow me. Instead, I want to randomly sample 100 candies from
> > >>each sample and retain info on their associated type. I
> > >>could make a list of all the candies in each sample:
> > >>
> > >>sample 1
> > >>red
> > >>red
> > >>red
> > >>red
> > >>green
> > >>green
> > >>black
> > >>red
> > >>black
> > >>...
> > >>
> > >>and then randomly sample those rows. Repeat for each
> > sample. But, I
> > >>am not sure how to do that without alot of loops, and am
> > >>wondering if there is an easier way in R. Thanks! I should
> > >>have laid this out in the first email...sorry.
> > >>
> > >>
> > >>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz
> > <mailto:petr.pikal at precheza.cz>> wrote:
> > >>
> > >>>Hi
> > >>>
> > >>>I am not experienced in Matlab and from your explanation I
> > >>>do not understand what exactly do you want. It seems that
> > >>>you want randomly choose a sample of 100 rows from your
> > >>>martix, what can be achived by sample.
> > >>>
> > >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300)
> > >>>DF[sample(1:100, 10),]
> > >>>
> > >>>If you want to do this several times, you need to save your
> > >>>result and than it depends on what you want to do next. One
> > >>>suitable form is list of matrices the other is array and you
> > >>>can use for loop for completing it.
> > >>>
> > >>>HTH
> > >>>Petr
> > >>>
> > >>>
> > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> > >>>
> > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400
> > >>>From: "Brian Frappier"
> > <brian.frappier at gmail.com <mailto:brian.frappier at gmail.com>>
> > >>>To: r-help at stat.math.ethz.ch
> > <mailto:r-help at stat.math.ethz.ch> Subject:
> > >>> [R] rarefy a matrix of counts
> > >>>
> > >>>
> > >>>>Hi all,
> > >>>>
> > >>>>I have a matrix of counts for objects (rows) by samples
> > >>>>(columns).
> > >>>> I aimed for about 500 counts in each sample (I have about
> > >>>> 80
> > >>>>samples) and would now like to rarefy these down to 100
> > >>>>counts in each sample using simple random sampling without
> > >>>>replacement. I plan on rarefying several times for each
> > >>>>sample. I could do the tedious looping task of making a
> > >>>>list of all objects (with its associated identifier) in
> > >>>>each sample and then use the wonderful "sampling" package
> > >>>>to select a sub-sample of 100 for each sample and thereby
> > >>>>get a logical vector of inclusions. I would then regroup
> > >>>>the resulting logical vector into a vector of counts by
> > >>>>object, rinse and repeat several times for each sample.
> > >>>>
> > >>>>Alternately, using the same list, I could create a random
> > >>>>index of integers between 1 and the number of objects for a
> > >>>>sample (without repeats) and then select those objects from
> > >>>>the list. Again, rinse and repeat several time for each
> > >>>>sample.
> > >>>>
> > >>>>Is there a way to directly rarefy a matrix of counts
> > >>>>without having to create a list of objects first? I am
> > >>>>trying to switch to R from Matlab and am trying to pick up
> > >>>>good programming habits from the start.
> > >>>>
> > >>>>Much appreciation!
> > >>>>
> > >>>> [[alternative HTML version deleted]]
> > >>>>
> > >>>>______________________________________________
> > >>>>R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> > mailing list
> > >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> > <https://stat.ethz.ch/mailman/listinfo/r-help>
> > >>>>PLEASE do read the posting guide
> > >>>>http://www.R-project.org/posting-guide.html and provide
> > >>>>commented, minimal, self-contained, reproducible code.
> > >>>
> > >>>Petr Pikal
> > >>>petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> > >>>
> > >>>
> > >>
> > >
> > > Petr Pikal
> > > petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> > mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > > code.
> > >
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list