[R] group bunch of lines in a data.frame, an additional requirement
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Sep 15 16:47:25 CEST 2006
Good idea. You could write it compactly like this:
> transform(aggregate(cbind(DF[2:3], o = 1), DF[c(1,4)], sum, na.rm = TRUE),
+ V2 = V2/o, V3 = V3/o)
V1 V4 V2 V3 o
1 A ID1 2.0 400 3
2 B ID2 0.7 35 2
3 C ID3 5.0 70 1
On 9/15/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> (re)-Hello
> I actually thought about another possibility with a "1" column, a sum
> (instead of a mean), and a division of the columns for which I want
> the mean:
>
> > DF = data.frame( V1=c("A","A","A","B","B","C") , V2=c(1,3,2,0.5,0.9,5.0), V3=c(200,800,200,20,50,70), V4=c("ID1","ID1","ID1","ID2","ID2","ID3"))
> > DF2 = cbind(DF,o=rep(1,length(DF[,1])))
> > DF3 = aggregate(DF2[,c(2,3,5)], data.frame(code=DF2[,1],id=DF2[,4]), sum, na.rm=T)
> > DF3[,3:4]=DF3[,3:4]/DF3[,5]
> > DF3
> code id V2 V3 o
> 1 A ID1 2.0 400 3
> 2 B ID2 0.7 35 2
> 3 C ID3 5.0 70 1
>
> Thanks again,
>
> Best,
>
> Emmanuel
>
> On 9/15/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > Dear Mark, dear Gabor,
> >
> > Thanks again for your help! It is great to be able to get answers when
> > no-one (can you believe this?) uses R in your lab.
> >
> > The package doBy looks really convenient for many puposes.
> >
> > Best,
> >
> > Emmanuel
> >
> > On 9/15/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > > Here are three different ways to do it:
> > >
> > > # base R
> > > fb <- function(x)
> > > c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2),
> > > V3.mean = mean(x$V3), n = length(x$V1))
> > > do.call(rbind, by(DF, DF[c(1,4)], fb))
> > >
> > > # package doBy
> > > library(doBy)
> > > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> > >
> > > # package reshape
> > > library(reshape)
> > > f <- function(x) c(mean = mean(x), n = length(x))
> > > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]
> > >
> > > ----
> > >
> > > > # base R
> > > > fb <- function(x)
> > > + c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2),
> > > + V3.mean = mean(x$V3), n = length(x$V1))
> > > > do.call(rbind, by(DF, DF[c(1,4)], fb))
> > > V1 V4 V2.mean V3.mean n
> > > [1,] 1 1 2.0 400 3
> > > [2,] 3 1 5.0 70 1
> > > [3,] 2 2 0.7 35 2
> > > >
> > > > # package doBy
> > > > library(doBy)
> > > > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> > > V1 V4 mean.V2 mean.V3 length.V3
> > > 1 A ID1 2.0 400 3
> > > 2 C ID1 5.0 70 1
> > > 3 B ID2 0.7 35 2
> > > >
> > > > # package reshape
> > > > library(reshape)
> > > > f <- function(x) c(mean = mean(x), n = length(x))
> > > > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]
> > > V1 V4 V2_mean V2_n V3_mean
> > > 1 A ID1 2.0 3 400
> > > 2 B ID2 0.7 2 35
> > > 3 C ID1 5.0 1 70
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---
> > >
> > > > library(doBy)
> > > > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> > > V1 V4 mean.V2 mean.V3 length.V3
> > > 1 A ID1 2.0 400 3
> > > 2 C ID1 5.0 70 1
> > > 3 B ID2 0.7 35 2
> > > >
> > > > library(reshape)
> > > > f <- function(x) c(mean = mean(x), n = length(x))
> > > > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]
> > > V1 V4 V2_mean V2_n V3_mean
> > > 1 A ID1 2.0 3 400
> > > 2 B ID2 0.7 2 35
> > > 3 C ID1 5.0 1 70
> > >
> > >
> > >
> > > On 9/14/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > > Thanks Gabor, that is much faster than using a loop!
> > > >
> > > > I've got a last question:
> > > >
> > > > Can you think of a fast way of keeping track of the number of
> > > > observations collapsed for each entry?
> > > >
> > > > i.e. I'd like to end up with:
> > > >
> > > > A 2.0 400 ID1 3 (3obs in the first matrix)
> > > > B 0.7 35 ID2 2 (2obs in the first matrix)
> > > > C 5.0 70 ID1 1 (1obs in the first matrix)
> > > >
> > > > Or is it required to use an temporary matrix that is merged later? (As
> > > > examplified by Mark in a previous email?)
> > > >
> > > > Thanks a lot for your help,
> > > >
> > > > Emmanuel
> > > >
> > > > On 9/13/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > > > > See below.
> > > > >
> > > > > On 9/13/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > > > > Thanks for pointing me out "aggregate", that works fine!
> > > > > >
> > > > > > There is one complication though: I have mixed types (numerical and character),
> > > > > >
> > > > > > So the matrix is of the form:
> > > > > >
> > > > > > A 1.0 200 ID1
> > > > > > A 3.0 800 ID1
> > > > > > A 2.0 200 ID1
> > > > > > B 0.5 20 ID2
> > > > > > B 0.9 50 ID2
> > > > > > C 5.0 70 ID1
> > > > > >
> > > > > > One letter always has the same ID but one ID can be shared by many
> > > > > > letters (like ID1)
> > > > > >
> > > > > > I just want to keep track of the ID, and get a matrix like:
> > > > > >
> > > > > > A 2.0 400 ID1
> > > > > > B 0.7 35 ID2
> > > > > > C 5.0 70 ID1
> > > > > >
> > > > > > Any idea on how to do that without a loop?
> > > > >
> > > > > If V4 is a function of V1 then you can aggregate by it too and it will
> > > > > appear but have no effect on the classification:
> > > > >
> > > > > > aggregate(DF[2:3], DF[c(1,4)], mean)
> > > > > V1 V4 V2 V3
> > > > > 1 A ID1 2.0 400
> > > > > 2 C ID1 5.0 70
> > > > > 3 B ID2 0.7 35
> > > > >
> > > > >
> > > > > >
> > > > > > Many thanks,
> > > > > >
> > > > > > Emmanuel
> > > > > >
> > > > > > On 9/12/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like to group the lines of a matrix so that:
> > > > > > > A 1.0 200
> > > > > > > A 3.0 800
> > > > > > > A 2.0 200
> > > > > > > B 0.5 20
> > > > > > > B 0.9 50
> > > > > > > C 5.0 70
> > > > > > >
> > > > > > > Would give:
> > > > > > > A 2.0 400
> > > > > > > B 0.7 35
> > > > > > > C 5.0 70
> > > > > > >
> > > > > > > So all lines corresponding to a letter (level), become a single line
> > > > > > > where all the values of each column are averaged.
> > > > > > >
> > > > > > > I've done that with a loop but it doesn't sound right (it is very
> > > > > > > slow). I imagine there is a
> > > > > > > sort of "apply" shortcut but I can't figure it out.
> > > > > > >
> > > > > > > Please note that it is not exactly a matrix I'm using, the function
> > > > > > > "typeof" tells me it's a list, however I access to it like it was a
> > > > > > > matrix.
> > > > > > >
> > > > > > > Could someone help me with the right function to use, a help topic or
> > > > > > > a piece of code?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Emmanuel
> > > > > > >
> > > > > >
> > > > > > ______________________________________________
> > > > > > R-help at stat.math.ethz.ch mailing list
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > > > and provide commented, minimal, self-contained, reproducible code.
> > > > > >
> > > > >
> > > >
> > >
> >
>
More information about the R-help
mailing list