[R] Help on averaging sets of rows defined by row name

Booman, M m.booman at path.umcg.nl
Fri Apr 20 15:26:53 CEST 2007

Dear all,

This is my problem: I have a table of gene expression data, where 1st column is gene name, and 2nd -39th columns each are exression data for 38 samples. There are multiple measurements per sample for each gene, so there are multiple rows for each gene name. I want to average these measurements so i end up with one value per sample for each gene name. The output data frame (table.averaged) is further used in other R script. The code I use now (see below) takes 20 secs for each loop, so it takes 45 minutes to average my files of 13500 unique genes. Can anyone help me do this faster?

Cheers, marije

Code I use: 

table.imputed[,1] <- as.character(table.imputed[,1])    #table.imputed is data.frame,1st column = gene name (class factor), rest of columns = expression data (class numeric)

genesunique <- unique(table.imputed[,1])                   #To make list of unique genes in the set

table.averaged <- NULL
  for (j in 1:length(genesunique)) {
     if (j%%100 == 0){                                                   #To report progress
       cat(j, "genes finished", sep=" ", fill=TRUE)
     table.averaged<-rbind(table.averaged,givemean(genesunique[j], table.imputed))   #collects all rows of average values and binds them back into one data frame

givemean <- function (gene, table.imputed) {
   thisgene<-table.imputed[table.imputed[,1]==gene,]                                       #make a subset containing only the rows for one gene name
   data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean, na.rm=TRUE)))     #calculates average for each sample (column) and outputs one row of average values and the gene name

De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van dit bericht, het openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht.

The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message.

More information about the R-help mailing list