[R] Fwd: rarefy a matrix of counts
Tony Plate
tplate at acm.org
Wed Oct 11 20:54:44 CEST 2006
Two things to note:
(1) rep() can be vectorized:
> rep(1:3, 2:4)
[1] 1 1 2 2 2 3 3 3 3
>
(2) you will likely get much better performance if you work with
integers and convert to strings after sampling (or use factors), e.g.:
> c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)]
[1] "red" "blue" "red" "red" "red"
>
-- Tony Plate
Brian Frappier wrote:
> I tried all of the approaches below.
>
> the problem with:
>
> > x <- data.frame(matrix(NA,100,3))
> > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > if you want result in data frame
> > or
> > x<-vector("list", 3)
> > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
>
> is that this code still samples the rows, not the elements, i.e. returns
> 100 or 300 in the matrix cells instead of "red" or a matrix of counts by
> color (object type) like:
> x1 x2 x3
> red 32 5 60
> gr 68 95 40
> sum 100 100 100
>
> It looks like Tony is right: sampling without replacement requires
> listing of all elements to be sampled. But, the code Petr provided
>
> x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100)
>
> did give me a clue of how to quickly make such a list using the 'rep'
> command. I will for-loop a rep statement using my original matrix to
> create a list of elements for each sample:
>
> Thanks Petr and Tony for your help!
>
> On 10/11/06, *Tony Plate* <tplate at acm.org <mailto:tplate at acm.org>> wrote:
>
> Here's a way using apply(), and the prob= argument of sample():
>
> > df <- data.frame(sample1=c(red=400,green=100,black=300),
> sample2=c(300,0,1000), sample3=c(2500,200,500))
> > df
> sample1 sample2 sample3
> red 400 300 2500
> green 100 0 200
> black 300 1000 500
> > set.seed(1)
> > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> size=7, prob=counts))
> sample1 sample2 sample3
> [1,] 1 3 1
> [2,] 1 3 1
> [3,] 3 3 1
> [4,] 2 3 2
> [5,] 1 3 1
> [6,] 2 3 1
> [7,] 2 3 3
> >
>
> Note that this does sampling WITH replacement.
> AFAIK, sampling without replacement requires enumerating the entire
> population to be sampled from. I.e., you cannot do
> > sample(1:3, prob=1:3, rep=F, size=4)
> instead of
> > sample(c(1,2,2,3,3,3), rep=F, size=4)
>
> -- Tony Plate
>
> From reading ?sample, I was a little unclear on whether sampling
> without replacement could work
>
> Petr Pikal wrote:
> > Hi
> >
> > a litle bit different story. But
> >
> > x1 <- sample(c(rep("red",400),rep("green", 100),
> > rep("black",300)),100)
> >
> > is maybe close. With data frame (if it is not big)
> >
> >
> >>DF
> >
> > color sample1 sample2 sample3
> > 1 red 400 300 2500
> > 2 green 100 0 200
> > 3 black 300 1000 500
> >
> > x <- data.frame(matrix(NA,100,3))
> > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > if you want result in data frame
> > or
> > x<-vector("list", 3)
> > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> >
> > if you want it in list. Maybe somebody is clever enough to discard
> > for loop but you said you have 80 columns which shall be no problem.
> >
> > HTH
> > Petr
> >
> >
> >
> >
> >
> >
> >
> > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> >
> > Date sent: Wed, 11 Oct 2006 10:11:33 -0400
> > From: "Brian Frappier" < brian.frappier at gmail.com
> <mailto:brian.frappier at gmail.com>>
> > To: "Petr Pikal" <petr.pikal at precheza.cz
> <mailto:petr.pikal at precheza.cz>>
> > Subject: Fwd: [R] rarefy a matrix of counts
> >
> >
> >>---------- Forwarded message ----------
> >>From: Brian Frappier <brian.frappier at gmail.com
> <mailto:brian.frappier at gmail.com>>
> >>Date: Oct 11, 2006 10:10 AM
> >>Subject: Re: [R] rarefy a matrix of counts
> >>To: r-help at stat.math.ethz.ch <mailto:r-help at stat.math.ethz.ch>
> >>
> >>Hi Petr,
> >>
> >>Thanks for your response. I have data that looks like the
> following:
> >>
> >> sample 1 sample 2 sample 3 ....
> >>red candy 400 300 2500
> >>green candy 100 0 200
> >>black candy 300 1000 500
> >>
> >>I don't want to randomly select either the samples (columns) or the
> >>"candy" types (rows), which sample as you state would allow me.
> >>Instead, I want to randomly sample 100 candies from each sample and
> >>retain info on their associated type. I could make a list of all the
> >>candies in each sample:
> >>
> >>sample 1
> >>red
> >>red
> >>red
> >>red
> >>green
> >>green
> >>black
> >>red
> >>black
> >>...
> >>
> >>and then randomly sample those rows. Repeat for each
> sample. But, I
> >>am not sure how to do that without alot of loops, and am wondering if
> >>there is an easier way in R. Thanks! I should have laid this out in
> >>the first email...sorry.
> >>
> >>
> >>On 10/11/06, Petr Pikal <petr.pikal at precheza.cz
> <mailto:petr.pikal at precheza.cz>> wrote:
> >>
> >>>Hi
> >>>
> >>>I am not experienced in Matlab and from your explanation I do not
> >>>understand what exactly do you want. It seems that you want randomly
> >>>choose a sample of 100 rows from your martix, what can be achived by
> >>>sample.
> >>>
> >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300)
> >>>DF[sample(1:100, 10),]
> >>>
> >>>If you want to do this several times, you need to save your result
> >>>and than it depends on what you want to do next. One suitable form
> >>>is list of matrices the other is array and you can use for loop for
> >>>completing it.
> >>>
> >>>HTH
> >>>Petr
> >>>
> >>>
> >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> >>>
> >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400
> >>>From: "Brian Frappier"
> <brian.frappier at gmail.com <mailto:brian.frappier at gmail.com>>
> >>>To: r-help at stat.math.ethz.ch
> <mailto:r-help at stat.math.ethz.ch> Subject:
> >>> [R] rarefy a matrix of counts
> >>>
> >>>
> >>>>Hi all,
> >>>>
> >>>>I have a matrix of counts for objects (rows) by samples (columns).
> >>>> I aimed for about 500 counts in each sample (I have about 80
> >>>>samples) and would now like to rarefy these down to 100 counts in
> >>>>each sample using simple random sampling without replacement. I
> >>>>plan on rarefying several times for each sample. I could do the
> >>>>tedious looping task of making a list of all objects (with its
> >>>>associated identifier) in each sample and then use the wonderful
> >>>>"sampling" package to select a sub-sample of 100 for each sample
> >>>>and thereby get a logical vector of inclusions. I would then
> >>>>regroup the resulting logical vector into a vector of counts by
> >>>>object, rinse and repeat several times for each sample.
> >>>>
> >>>>Alternately, using the same list, I could create a random index of
> >>>>integers between 1 and the number of objects for a sample (without
> >>>>repeats) and then select those objects from the list. Again,
> >>>>rinse and repeat several time for each sample.
> >>>>
> >>>>Is there a way to directly rarefy a matrix of counts without
> >>>>having to create a list of objects first? I am trying to switch
> >>>>to R from Matlab and am trying to pick up good programming habits
> >>>>from the start.
> >>>>
> >>>>Much appreciation!
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>>______________________________________________
> >>>>R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> mailing list
> >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>PLEASE do read the posting guide
> >>>>http://www.R-project.org/posting-guide.html and provide commented,
> >>>>minimal, self-contained, reproducible code.
> >>>
> >>>Petr Pikal
> >>>petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> >>>
> >>>
> >>
> >
> > Petr Pikal
> > petr.pikal at precheza.cz <mailto:petr.pikal at precheza.cz>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch <mailto:R-help at stat.math.ethz.ch>
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
More information about the R-help
mailing list