[BioC] finding and averaging replicate gene records

Wed Mar 16 12:32:33 CET 2005

Try aggregate() or tapply(). See example below where "A" is repeated
twice. 

m <- cbind.data.frame( ID=c("A", "B", "A", "C"), array1=1:4,
array2=5:8 )

 m
  ID array1 array2
1  A      1      5
2  B      2      6
3  A      3      7
4  C      4      8

aggregate(m[ ,-1], list(GENE=m$ID), mean, na.rm=TRUE)
  GENE array1 array2
1     A      2      6
2     B      2      6
3     C      4      8

On Wed, 2005-03-16 at 09:33 +0100, Oosting, J. (PATH) wrote:
> I'm not entirely sure this will work in it's current form. I've adapted it from a routine I use to do this with expression sets, so maybe some typecasting or transformation to the proper classtypes is needed. Your data is in the dataf variable
> 
> 
>   mean.row<-function(rows) {if (length(rows)==1) ex[rows,] else apply(ex[rows,],2,mean,na.rm=TRUE)}
>   # Select Vector of unigene ids that are in data and have correct (non-empty) mapping
>   geneIds<-dataf[rownames(dataf),2]
>   geneIds<-geneIds[geneIds!=""]
>   # subset the expression values
>   ex<-dataf[,c(-1,-2)]
>   # make a list that contains combined rownames for each unigene id
>   newrows<-split(names(geneIds),geneIds)
>   # the t() is needed because the dimensions seem to come out wrong of sapply
>   exn<-t(sapply(newrows,mean.row))
>   # Put the unigene Ids in the result
>   cbind(names(newrows),exn) # or rownames(exn)<-names(newrows)
> 
> Jan Oosting
> 
> 
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch 
> > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of zhihua li
> > Sent: woensdag 16 maart 2005 08:33
> > To: bioconductor at stat.math.ethz.ch
> > Subject: [BioC] finding and averaging replicate gene records
> > 
> > 
> > Hi netter!
> > 
> > In most microarray slides a single gene will be represented 
> > by multiple 
> > items. Sometimes it's unforseable because they have different genbank 
> > accession numbers and you will not find them until you get a 
> > unigene list 
> > for  all your gene items.
> > 
> > Now I have a dataframe . The rows are gene records(accession number, 
> > unigene ID and expression values in different conditions) ; 
> > the 1st column 
> > is genbank accession numbers, the 2nd column is unigene IDs, from 3rd 
> > column on are different conditions). All the accession 
> > numbers are unique, 
> > but through unigene IDs i can find that some items, though 
> > with different 
> > accession numbers, are in fact sharing the same unigene ID. I 
> > would like to 
> > find the gene records containing replicate unigene IDs and 
> > merge them into 
> > one record by averaging different expression values in the 
> > same condition.
> > 
> > Could anyone give me a clue about how to write the code? Or 
> > are there any 
> > contributed functions can do this stuff?
> > 
> > Thanks a lot!
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>