[BioC] Subsetting Affybatch Objects by Gene list.

Fri Mar 19 11:33:31 MET 2004

Hi again,

	First, I'd like to say thanks to everyone who's sent comments or suggestions on this. I've now managed to sort the problems out but just wanted to reply to a few of the questions raised.

>From: "Matthew  Hannah" <Hannah at mpimp-golm.mpg.de>
>I'm abit confused as to why you want to go to so much trouble to subset your >>data.
>RMA or any of the expresso functions can be called on the entire affybatch and >then 
>written to file.
>

Yes that's true but I want to examine what sort of a difference it makes if I normalize at the probe level rather than at the "expression" level (i.e. the value obtained by combining all of the probe pairs in a probe set). I can normalize the probes, but computing the expression values, well, I'd rather use the R functions already written if I can!

>I don't see how this would be any different to the MAS5 analysis
>you presumably want consistency with as MAS5 scales/normalises on a whole chip >basis anyway.

I've performed all of my previous analyses in Excel starting with only the MAS5 P/A calls (filtered almost as you suggest below!) and the raw expression levels (most definitely not already quantile normalized).

>If you >want to 
>use BioC/R for more analysis you could save the result to a txt file and then >just read it back into R.

Again, very true but normalizing in Excel is quite laborious (particularly over 24x5 individual chips) and I wanted to use R to automate the process.

>A better thing to look into may be whether you really want to filter based on >the P/A 
>calls as with RMA you might find that including A genes only has a very small >effect on
>your final list of genes (if based on fold change).

I'm using SAM (amongst other things) since it's less biased towards selecting low expressors than fold change and I want to compare the results with previous results based on P/A filtering. Ironically, I want to use RMA since it's supposed to be less biased towards *high* expressors than the MAS5 expressions!

> And thats before considering if
>the P/A call is useful due to the 1/3 of MM>PM.

That's something we plan to look at post hoc (and probably post haste too) once we've got an idea of what results we get from the absolute filter.

Professor Gentleman;
>Well, first, R and Bioconductor are *open source* and that means you
> really can get at the source code for all functions and methods. 

I'm sorry, I didn't mean to imply that the code was hidden, just that I couldn't figure out how to get at it! (although I still can't convince getMethods to recognise "normalize" as an argument but that's a question for a different mailing list!)

>I can only suggest that if you want to do reasonably sophisticated
> things in any language that spending some time learning how to
> program in it will be rewarded.

I hadn't realised that affybatch was an S4 object. I thought, naively, that it was a structure specific to affybatch (or at least bioconductor), and I had also hoped that there might be some pre-programed way of getting expresso to do what I wanted which I had missed. That's why I queried it here - I'm sorry if I wandered off topic for this list :)

Once again I'd like to thank everyone for their comments and advice. If anyone else is interested in comparing the effects of filtering before normalizing I'd be happy to send you a copy of my (rather ugly and inefficient) code.

best wishes

 Stu