[BioC] rma and SNOMAD

Lizhe Xu lxu at chnola-research.org
Fri Mar 19 20:40:58 MET 2004


I just read the SNOMAD from the book "The Analysis of Gene Expression Data". It makes the three steps of normalizations after global scale much meaningful to me:
local mean normalization and local variance correction across the intensities and local mean across the spatial surface.

Since now I try to use Bioconductor to analyze my Affy data (rma and gcrma), I wonder which of the above steps the quantile normalization in RMA corresponds to .

Is it possible to combine RMA(gcRMA) with SNOMAD? If yes, how?

Thanks.

Lizhe 

-----Original Message-----
From: bioconductor-request at stat.math.ethz.ch [mailto:bioconductor-request at stat.math.ethz.ch]
Sent: Friday, March 19, 2004 5:04 AM
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 13, Issue 45


Send Bioconductor mailing list submissions to
	bioconductor at stat.math.ethz.ch

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
	bioconductor-request at stat.math.ethz.ch

You can reach the person managing the list at
	bioconductor-owner at stat.math.ethz.ch

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."


Today's Topics:

   1. Re: limma: topTable (Julia Engelmann)
   2. Subsetting Affybatch Objects by Gene list. (Horswell, Stuart)


----------------------------------------------------------------------

Message: 1
Date: Fri, 19 Mar 2004 08:50:19 +0100
From: Julia Engelmann <julia.engelmann at biozentrum.uni-wuerzburg.de>
Subject: Re: [BioC] limma: topTable
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<200403190850.19084.julia.engelmann at biozentrum.uni-wuerzburg.de>
Content-Type: text/plain;  charset="iso-8859-1"


Sorry, here comes my code:

library(affy)
library(limma)
library(vsn)
data  	<- ReadAffy()
normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn")
Set		<- expresso(data, normalize.method="vsn", 
bgcorrect.method="none",pmcorrect.method="pmonly", 
summary.method="medianpolish")

# R and T stand for different treatment, R1 and R2 as well as T1 and T2 are 
biological reps.
design      <- model.matrix(~ -1+factor(c(1,1,2,2,3,3,4,4))) 
colnames(design) <- c("R1","R2","T1","T2")
fit              <- lmFit(Set, design)
contrast.matrix  <- makeContrasts(((R1+R2)-(T1
+T2))/2,R1-T1,R2-T2,levels=design);
fit2             <- contrasts.fit(fit, contrast.matrix)
fit2             <- eBayes(fit2)
data          <- read.table("ATH1-121501_annot.csv", header=TRUE, sep=",")
res              <- topTable(fit2, number=10,coef=1, 
adjust="fdr",genelist=data,sort.by="P",resort.by="M")

Thanks a lot again,
Julia

> >Hi Bioconductor folks,
> >
> >I have really been enjoying using the limma package, but I have just come
> >across a problem.
> >When I use the topTable-command, the slot "result$Probe.Set.ID" does not
> > seem to match the other entries I am interested in, namely M, P, t.
> >I am using R 1.8.0, limma 1.5.5 on Affymetrix ATH1-121501 chips.
> >
> >An example would be:
> >Output of the topTable-result:
> >
> >                 Probe.Set.ID            M               t
> > P.Value         B
> >5598    259302_at       4.593339 38.9793 1.126260e-05 11.88238
> >
> >If I check on gene number 5598 it says
> >geneNames(myExprSet)[5598]
> >[1] "250498_at"
> >
> >Am I misinterpreting 5598 as the index of my ExpressionSet?
>
> I think you've done something non-standard, anyway you can't have simply
> used lmFit() and eBayes() and topTable() on our exprSet object. Please show
> us enough of your code so that we can see how topTable is getting the probe
> set ID's.
>
> Gordon
>
> >Thanks a lot for any suggestions!
> >Julia



------------------------------

Message: 2
Date: Fri, 19 Mar 2004 10:33:31 -0000
From: "Horswell, Stuart" <stuart.horswell at csc.mrc.ac.uk>
Subject: [BioC] Subsetting Affybatch Objects by Gene list.
To: <bioconductor at stat.math.ethz.ch>
Message-ID:
	<A09F613372B70C4CBCE35A9095F434F009FFEE at icex34.cc.ic.ac.uk>
Content-Type: text/plain;	charset="iso-8859-1"


Hi again,

	First, I'd like to say thanks to everyone who's sent comments or suggestions on this. I've now managed to sort the problems out but just wanted to reply to a few of the questions raised.

>From: "Matthew  Hannah" <Hannah at mpimp-golm.mpg.de>
>I'm abit confused as to why you want to go to so much trouble to subset your >>data.
>RMA or any of the expresso functions can be called on the entire affybatch and >then 
>written to file.
>

Yes that's true but I want to examine what sort of a difference it makes if I normalize at the probe level rather than at the "expression" level (i.e. the value obtained by combining all of the probe pairs in a probe set). I can normalize the probes, but computing the expression values, well, I'd rather use the R functions already written if I can!

>I don't see how this would be any different to the MAS5 analysis
>you presumably want consistency with as MAS5 scales/normalises on a whole chip >basis anyway.

I've performed all of my previous analyses in Excel starting with only the MAS5 P/A calls (filtered almost as you suggest below!) and the raw expression levels (most definitely not already quantile normalized).

>If you >want to 
>use BioC/R for more analysis you could save the result to a txt file and then >just read it back into R.

Again, very true but normalizing in Excel is quite laborious (particularly over 24x5 individual chips) and I wanted to use R to automate the process.

>A better thing to look into may be whether you really want to filter based on >the P/A 
>calls as with RMA you might find that including A genes only has a very small >effect on
>your final list of genes (if based on fold change).

I'm using SAM (amongst other things) since it's less biased towards selecting low expressors than fold change and I want to compare the results with previous results based on P/A filtering. Ironically, I want to use RMA since it's supposed to be less biased towards *high* expressors than the MAS5 expressions!

> And thats before considering if
>the P/A call is useful due to the 1/3 of MM>PM.

That's something we plan to look at post hoc (and probably post haste too) once we've got an idea of what results we get from the absolute filter.

Professor Gentleman;
>Well, first, R and Bioconductor are *open source* and that means you
> really can get at the source code for all functions and methods. 

I'm sorry, I didn't mean to imply that the code was hidden, just that I couldn't figure out how to get at it! (although I still can't convince getMethods to recognise "normalize" as an argument but that's a question for a different mailing list!)

>I can only suggest that if you want to do reasonably sophisticated
> things in any language that spending some time learning how to
> program in it will be rewarded.

I hadn't realised that affybatch was an S4 object. I thought, naively, that it was a structure specific to affybatch (or at least bioconductor), and I had also hoped that there might be some pre-programed way of getting expresso to do what I wanted which I had missed. That's why I queried it here - I'm sorry if I wandered off topic for this list :)

Once again I'd like to thank everyone for their comments and advice. If anyone else is interested in comparing the effects of filtering before normalizing I'd be happy to send you a copy of my (rather ugly and inefficient) code.

best wishes

 Stu



------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor


End of Bioconductor Digest, Vol 13, Issue 45



More information about the Bioconductor mailing list