[R] Help on averaging sets of rows defined by row name
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Fri Apr 20 15:53:42 CEST 2007
Dear Marije,
I think that aggregate() would make your life a lot easier.
aggregate(table.imputed, by = table.imputed[, 1], FUN = "mean")
Cheers,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be
Do not put your faith in what statistics say until you have carefully
considered what they do not say. ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op stat.math.ethz.ch
> [mailto:r-help-bounces op stat.math.ethz.ch] Namens Booman, M
> Verzonden: vrijdag 20 april 2007 15:27
> Aan: r-help op stat.math.ethz.ch
> Onderwerp: [R] Help on averaging sets of rows defined by row name
>
> Dear all,
>
> This is my problem: I have a table of gene expression data,
> where 1st column is gene name, and 2nd -39th columns each are
> exression data for 38 samples. There are multiple
> measurements per sample for each gene, so there are multiple
> rows for each gene name. I want to average these measurements
> so i end up with one value per sample for each gene name. The
> output data frame (table.averaged) is further used in other R
> script. The code I use now (see below) takes 20 secs for each
> loop, so it takes 45 minutes to average my files of 13500
> unique genes. Can anyone help me do this faster?
>
> Cheers, marije
>
> Code I use:
>
>
> table.imputed[,1] <- as.character(table.imputed[,1])
> #table.imputed is data.frame,1st column = gene name (class
> factor), rest of columns = expression data (class numeric)
>
> genesunique <- unique(table.imputed[,1])
> #To make list of unique genes in the set
>
> table.averaged <- NULL
> for (j in 1:length(genesunique)) {
> if (j%%100 == 0){
> #To report progress
> cat(j, "genes finished", sep=" ", fill=TRUE)
> }
>
> table.averaged<-rbind(table.averaged,givemean(genesunique[j],
> table.imputed)) #collects all rows of average values and
> binds them back into one data frame
> }
>
> givemean <- function (gene, table.imputed) {
> thisgene<-table.imputed[table.imputed[,1]==gene,]
> #make a subset containing only
> the rows for one gene name
> data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean,
> na.rm=TRUE))) #calculates average for each sample
> (column) and outputs one row of average values and the gene name
> }
>
>
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd
> voor de geadresseerde(n). Anderen dan de geadresseerde mogen
> geen gebruik maken van dit bericht, het openbaar maken of op
> enige wijze verspreiden of vermenigvuldigen. Het UMCG kan
> niet aansprakelijk gesteld worden voor een incomplete
> aankomst of vertraging van dit verzonden bericht.
>
> The contents of this message are confidential and only
> intended for the eyes of the addressee(s). Others than the
> addressee(s) are not allowed to use this message, to make it
> public or to distribute or multiply this message in any way.
> The UMCG cannot be held responsible for incomplete reception
> or delay of this transferred message.
>
More information about the R-help
mailing list