[BioC] Assigning gene symbols to Affymetrix data and averaging probes
James W. MacDonald
jmacdon at uw.edu
Wed Oct 3 20:48:51 CEST 2012
Hi Lesley,
On 10/3/2012 2:29 PM, Hoyles, Lesley wrote:
> Hi Jim
>
> Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format.
>
> I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol information into the output of lmFit (for instance, as fit$symbol)?
There is a 'genes' slot to an MArrayLM object (the output from e.g.,
lmFit) into which you can stuff a data.frame containing gene symbols, etc.
Another option is to use the annaffy package to do the annotation. And
if you are going to use annaffy and limma, then I should make a
shameless plug for the affycoretools package, which contains a function
designed to go from an MArrayLM object to annotated output in a single
function call (outputting HTML or text files).
Best,
Jim
>
> Best wishes
> Lesley
> .
>
> ________________________________________
> From: James W. MacDonald [jmacdon at uw.edu]
> Sent: 03 October 2012 16:30
> To: Hoyles, Lesley
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes
>
> Hi Lesley,
>
> On 10/3/2012 10:55 AM, Hoyles, Lesley wrote:
>> Hi
>>
>> I have processed my affy data and am able to annotate the object
>> mice.loess using the following. ID<- featureNames(mice.loess) Symbol
>> <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess)<-
>> data.frame(ID=ID,Symbol=Symbol)
>>
>>
>> However, when I convert my object as follows - expr.loess<-
>> exprs(mice.loess) - I lose the annotation and have been unable to
>> find a way to annotate expr.loess. Please could anybody suggest how I
>> can annotate expr.loess?
> expr.loess<- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess))
>
>>
>> Is there a way of averaging probes for each gene with Affymetrix
>> data? I've been able to do this with single-channel Agilent data
>> using the example given in the limma guide.
> There are probably two reasonable ways to do this. First, the easiest.
>
> dat<- ReadAffy(cdfname = "mouse4302mmentrezcdf")
>
> and proceed from there. This will use the MBNI re-mapped CDF package
> based on Entrez Gene IDs, and you will have a single value per gene
> after summarization. There are other ways to map the probes; see
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
> at the bottom of the page for more info.
>
> Alternatively if you want to stick with the original probesets, the
> problem arises that some probesets are not well annotated, so what to do
> with those? In addition, gene symbols are not guaranteed to be unique,
> so you can't just assume that they are. Entrez Gene and UniGene IDs are
> supposed to be unique, so you could go with them, doing something like
> (untested)
>
> gns<- toTable(mouse4302ENTREZID)
> alldat<- merge(gns, expr.loess, by = 1) ## where expr.loess is the
> data.frame I suggest above
> alldatlst<- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,])
> combined.data<- do.call("rbind", lapply(alldatlst, function(x)
> c(x[1,1:3], colMeans(x[,-c(1:3)])))
>
> Here I am assuming that after the merge() step the first three columns
> are the probeset ID, gene_id, symbol, and the remaining columns are the
> expression values. You will lose all data for which there isn't an
> Entrez Gene ID, but the same is true of the MBNI method I outline above.
>
> Best,
>
> Jim
>
>
>>
>> Thanks in advance for your help.
>>
>> Best wishes Lesley _______________________________________________
>> Bioconductor mailing list Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list