[BioC] aggregate_summarizing expression values over entrez gene ids
Wolfgang Huber
huber at ebi.ac.uk
Thu Nov 13 14:23:01 CET 2008
Hi Vanessa,
Have a look at "tapply" and "by".
But you could also think a bit more about the rationale for summarizing.
The different probesets for the same Entrez gene ID are not replicates,
and they are not equivalent. Some may be more valid or useful than others.
An approach that I find useful is to determine the probeset that shows
most variability, and then believe that one. Of course, one can also
look at the actual mapping of the probes to the transcript and to the
gene structure, and make a decision based on that. For imporant results,
this is what I would recommend (besides, of course, wet-lab follow-up.)
Best wishes
Wolfgang
--
----------------------------------------------------
Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber
Vanessa Vermeirssen wrote:
> Hi,
>
> I have a dataframe containing RMA normalized and summarized expression
> values for affymetrix probesets, av.data.
> I have looked up the Entrez gene ids for the probesets in the annotation
> package, entrezids.
> Multiple probesets map of course to the same entrez id and I would like
> to combine these data into one row,
> by averaging the expression values for the same entrez ids over the
> different experiments.
> I tried the function "aggregate" to do this, but somehow it gives an
> error that the arguments are not of the same length, but they are...???
> How can I solve this or is there any other way to do this?
>
> See my code below...
>
> av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep =
> "\t", header = T, na.strings = "NA", fill = T)
> av.data[1:5,1:5]
> X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose
> 1007_s_at 9.281857 9.340795 9.151775 8.319741
> 1053_at 7.000684 6.867318 4.633061 5.101534
> 117_at 6.007608 6.124562 5.425565 5.692270
> 121_at 6.543294 6.728119 7.651856 7.692947
> 1255_g_at 3.077289 2.989938 4.622865 2.955812
> X2_adipose_omental
> 1007_s_at 7.909480
> 1053_at 4.509407
> 117_at 6.298798
> 121_at 7.598834
> 1255_g_at 3.040816
>
> probes <- ls(hgu133plus2ENTREZID)
> entrezids <- unlist(mget(probes,hgu133plus2ENTREZID))
> newdata <- data.frame(entrezids,av.data)
>
> sum <- aggregate(av.data,as.list(entrezids),mean)
> Error in FUN(X[[1L]], ...) : arguments must have same length
>
> > length(as.list(entrezids))
> [1] 54675
> > dim(av.data)
> [1] 54675 69
>
> sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean)
> Error in FUN(X[[1L]], ...) : arguments must have same length
> > length(as.list(newdata$entrezids))
> [1] 54675
> > dim(newdata)
> [1] 54675 70
>
>
> Thank you so much!
> Vanessa
>
More information about the Bioconductor
mailing list