[BioC] Subsetting Affybatch objects by gene list.

Tue Mar 16 14:52:42 MET 2004

On Tue, Mar 16, 2004 at 11:26:43AM -0000, Horswell, Stuart wrote:
> 
> 
> Many thanks to those of you who replied to my earlier query about subsetting Affybatch objects. However, I fear I didn't explain what I wanted to do sufficently well.
> 
> I have a set of 24 arrays in a single affybatch object. Ultimately I would like to perform a quantile normalization on this, using expresso or rma, for which I will need probe pair/set level data in affybatch format (and since the bg.correct and normalize functions won't display their souce code as easily as, say, expresso, I can't side-step this by altering the code). However, I want to remove all genes which are called "Absent" (in the sense of MAS5.0) across all 24 arrays before I normalize (for continuity with previous analyses performed in Excel).

 Well, first, R and Bioconductor are *open source* and that means you
 really can get at the source code for all functions and methods. 

 getMethods("bg.correct") seems to be pretty simple (you can find out
 about it by going ?getMethods).

 If I understand what you are trying to do, you might want to look at
 the matchprobes package where we do something similar (although there
 we combine chips by matching on probe sequence but conceptually it is
 not different from what you are doing). 

> 
> I use the mas5calls function to obtain a list of affy id tags which will tell me which probesets to remove, however, since expresso and rma require affybatch objects as arguments, I need to produce an affybatch object containing probe data, *not* one of the arrays which one obtains after using the exprs function. (Previously I used exprs purely to get a list of affy id's I could export to Excel).
> 
> So, I guess I should phrase my question like this - how does one replace objects in the cdf and exprs slots of an affybatch object? This would enable me to use the methods kindly suggested previously (and of course >?AffyBatch only tells me how to replace pm/mm values, rather than how to remove them altogether and simply setting their values identically equal to zero will obviously detrimentally affect the quantile normalization procedure). I can obtain an array of probe level data which only contains the data I want to normalize and a list of gene id's which should be excluded from the cdf list but I can't push them into expresso!
> 

 I can only suggest that if you want to do reasonably sophisticated
 things in any language that spending some time learning how to
 program in it will be rewarded. A bit of time with John Chambers book
 on Programming with Data would explain much of what you are asking
 (as would sime time with some of the documents on the Developer Page,
 at Bioconductor under the heading Programmers Reference Library),

 Robert

> 
> As a final note, I'm aware that I could just get the Absent list, get the (un-normalized) expression values and then write some code to normalize at expression level but I have in fact already done this and I now want to compare the results with what happens when one uses expresso, which, since "normalize" accepts and produces affybatch objects and is called before "computeExprsSet", presumably normalizes at the level of probe pairs, rather than expression level.
> 
> 
> thanks again for your time
> 
>     Stu
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

-- 
+---------------------------------------------------------------------------+
| Robert Gentleman                 phone : (617) 632-5250                   |
| Associate Professor              fax:   (617)  632-2444                   |
| Department of Biostatistics      office: M1B20                            |
| Harvard School of Public Health  email: rgentlem at jimmy.harvard.edu        |
+---------------------------------------------------------------------------+