[BioC] sum the values with same ID

Hervé Pagès hpages at fhcrc.org
Thu Mar 6 23:17:22 CET 2014


Hi anonymous guest,

On 03/06/2014 01:43 PM, guest [guest] wrote:
>
> Dear R user,

Note that this is the Bioconductor mailing list. Looks like your
question is a general question R question, not a Bioconductor
specific one.

>
> I have a matrix like:
>
> ID  group1  group2  group3
> s1  0       2       3
> s2  1       0       4
> s1  3       4       1
> s4  2       2       0
>
> I would like to sum the values with same ID to have the matrix as below:
> ID  group1  group2  group3
> s1  3       6       4
> s2  1       0       4
> s4  2       2       0
>
> I checked aggregate() may help to complete this job, but unfortunately I have the error message when I do this.
>
>> all.data <- read.csv("test.csv")

Note that 'all.data' is a data.frame, not a matrix.

>> aggregate(group1 ~ ID, data=all.data, FUN=sum)
> Error in eval(expr, envir, enclos) : object 'ID' not found

Trying with a matrix:

   m <- matrix(sample(12L), ncol=3)
   ID <- c("s1", "s2", "s1", "s4")
   rownames(m) <- ID
   colnames(m) <- paste0("group", 1:3)

Then:

   > m
      group1 group2 group3
   s1      1      9      7
   s2     11     12     10
   s1      2      5      6
   s4      8      3      4

   > aggregate(group1 ~ ID, data=m, FUN=sum)
     ID group1
   1 s1      3
   2 s2     11
   3 s4      8

aggregate() will probably be too slow anyway on a matrix with many many
rows (hundreds of thousands or more). Here is a faster solution that
leverages the IRanges infrastructure:

   library(IRanges)
   m2 <- apply(m, 2, function(x) sum(splitAsList(x, ID)))

Cheers,
H.

PS: IRanges is a Bioconductor package.

>
> Please help me to generate the sum for the matrix. It's been appreciated for any help.
>
> Thanks a lot
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] RColorBrewer_1.0-5 vegan_2.0-10       lattice_0.20-24    permute_0.8-0      Heatplus_2.6.0     gplots_2.12.1
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6       caTools_1.16       gdata_2.13.2       grid_3.0.2         gtools_3.1.1       KernSmooth_2.23-10 tools_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list