[BioC] Calculating mean values corresponding to duplicate row names
Sean Davis
seandavi at gmail.com
Thu Sep 24 23:24:43 CEST 2009
On Thu, Sep 24, 2009 at 2:49 PM, Hari Easwaran <hariharan.pe at gmail.com> wrote:
> Hi all,
> I have a table (t) of the following format (first row is the header):
> A x1 x2
> c 1 NA
> c 2 1002
> c 3 NA
> a 4 1004
> b 5 NA
> c 6 1006
> c 7 1007
> c 8 1008
> b 9 1009
> a 10 1010
> a 11 1011
> c 12 1012
> c 13 1013
> a 14 1014
> c NA 1015
>
>
> I want to find the mean of all the values corresponding to the row names
> "a", "b", "c" (which are duplicated).
> I tried the following which works:
> U <- unique(t$A)
> tt <- t(sapply(U, FUN=function(u) {mean(na.omit(t[t$A==u, ]))}))
Take a look at the aggregate() function.
Sean
> However, in reality the table t is real huge ( almost 44K rows and 100
> columns). The above approach takes too long. Is there another alternative
> that anyone can think of.
>
> Thanks a lot for any help/suggestions.
>
> Sincerely,
> Hari
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list