[BioC] finding and averaging replicate gene records
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Wed Mar 16 12:32:33 CET 2005
Try aggregate() or tapply(). See example below where "A" is repeated
twice.
m <- cbind.data.frame( ID=c("A", "B", "A", "C"), array1=1:4,
array2=5:8 )
m
ID array1 array2
1 A 1 5
2 B 2 6
3 A 3 7
4 C 4 8
aggregate(m[ ,-1], list(GENE=m$ID), mean, na.rm=TRUE)
GENE array1 array2
1 A 2 6
2 B 2 6
3 C 4 8
On Wed, 2005-03-16 at 09:33 +0100, Oosting, J. (PATH) wrote:
> I'm not entirely sure this will work in it's current form. I've adapted it from a routine I use to do this with expression sets, so maybe some typecasting or transformation to the proper classtypes is needed. Your data is in the dataf variable
>
>
> mean.row<-function(rows) {if (length(rows)==1) ex[rows,] else apply(ex[rows,],2,mean,na.rm=TRUE)}
> # Select Vector of unigene ids that are in data and have correct (non-empty) mapping
> geneIds<-dataf[rownames(dataf),2]
> geneIds<-geneIds[geneIds!=""]
> # subset the expression values
> ex<-dataf[,c(-1,-2)]
> # make a list that contains combined rownames for each unigene id
> newrows<-split(names(geneIds),geneIds)
> # the t() is needed because the dimensions seem to come out wrong of sapply
> exn<-t(sapply(newrows,mean.row))
> # Put the unigene Ids in the result
> cbind(names(newrows),exn) # or rownames(exn)<-names(newrows)
>
> Jan Oosting
>
>
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of zhihua li
> > Sent: woensdag 16 maart 2005 08:33
> > To: bioconductor at stat.math.ethz.ch
> > Subject: [BioC] finding and averaging replicate gene records
> >
> >
> > Hi netter!
> >
> > In most microarray slides a single gene will be represented
> > by multiple
> > items. Sometimes it's unforseable because they have different genbank
> > accession numbers and you will not find them until you get a
> > unigene list
> > for all your gene items.
> >
> > Now I have a dataframe . The rows are gene records(accession number,
> > unigene ID and expression values in different conditions) ;
> > the 1st column
> > is genbank accession numbers, the 2nd column is unigene IDs, from 3rd
> > column on are different conditions). All the accession
> > numbers are unique,
> > but through unigene IDs i can find that some items, though
> > with different
> > accession numbers, are in fact sharing the same unigene ID. I
> > would like to
> > find the gene records containing replicate unigene IDs and
> > merge them into
> > one record by averaging different expression values in the
> > same condition.
> >
> > Could anyone give me a clue about how to write the code? Or
> > are there any
> > contributed functions can do this stuff?
> >
> > Thanks a lot!
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list