[BioC] finding and averaging replicate gene records

Wed Mar 16 09:33:38 CET 2005

I'm not entirely sure this will work in it's current form. I've adapted it from a routine I use to do this with expression sets, so maybe some typecasting or transformation to the proper classtypes is needed. Your data is in the dataf variable

  mean.row<-function(rows) {if (length(rows)==1) ex[rows,] else apply(ex[rows,],2,mean,na.rm=TRUE)}
  # Select Vector of unigene ids that are in data and have correct (non-empty) mapping
  geneIds<-dataf[rownames(dataf),2]
  geneIds<-geneIds[geneIds!=""]
  # subset the expression values
  ex<-dataf[,c(-1,-2)]
  # make a list that contains combined rownames for each unigene id
  newrows<-split(names(geneIds),geneIds)
  # the t() is needed because the dimensions seem to come out wrong of sapply
  exn<-t(sapply(newrows,mean.row))
  # Put the unigene Ids in the result
  cbind(names(newrows),exn) # or rownames(exn)<-names(newrows)

Jan Oosting

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of zhihua li
> Sent: woensdag 16 maart 2005 08:33
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] finding and averaging replicate gene records
> 
> 
> Hi netter!
> 
> In most microarray slides a single gene will be represented 
> by multiple 
> items. Sometimes it's unforseable because they have different genbank 
> accession numbers and you will not find them until you get a 
> unigene list 
> for  all your gene items.
> 
> Now I have a dataframe . The rows are gene records(accession number, 
> unigene ID and expression values in different conditions) ; 
> the 1st column 
> is genbank accession numbers, the 2nd column is unigene IDs, from 3rd 
> column on are different conditions). All the accession 
> numbers are unique, 
> but through unigene IDs i can find that some items, though 
> with different 
> accession numbers, are in fact sharing the same unigene ID. I 
> would like to 
> find the gene records containing replicate unigene IDs and 
> merge them into 
> one record by averaging different expression values in the 
> same condition.
> 
> Could anyone give me a clue about how to write the code? Or 
> are there any 
> contributed functions can do this stuff?
> 
> Thanks a lot!
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>