[BioC] Oligo package

Benilton Carvalho bcarvalh at jhsph.edu
Mon Oct 5 16:30:42 CEST 2009

For the record, as I just replied this very same message sent privately:


Dear Thibault,

below some notes that back-reference our previous communications:

1) This array is 1050 x 1050, therefore 1,102,500 features. The ~700K  
distinct probes you refer to are properly annotated in the pd.hugene. 
1.0.st.v1 (2.4.1 and 3.0.0) packages;

2) Summarization to the gene-level is possible using the devel-version  
of the packages;

3) The summaries you're getting are at the probeset-level, as defined  
by the PGF file;


I failed to mention that, on the chip, there are "things" other then  
the experimental probes most of the people are interested in.  
Therefore the difference (700K vs 1M). The oligo package reads them  
all, but it doesn't mean that all of them are used when preprocessing.

So, Thibault, from what you report, I understand you really want to  
use the packages that are, right now, on the devel-branch (oligo and  
friends, plus annotation package)... or just wait for the release,  
which is coming up soon. By the way, when working on the code  
currently available under devel, I checked the results against those  
provided by the Affymetrix tool, and they were very consistent; so  
please let me know if you find something that does not agree with  
their tool (assuming the use of RMA) and I'll address that promptly.


On Oct 5, 2009, at 10:50 AM, Thibault Helleputte wrote:

> Hello,
> I use R version 2.9.2 under MacOSX Tiger, and the oligo (1.8.3),
> oligoclasses (1.6.0) and pd.hugene.1.0.st.v1 (2.4.1) packages. I
> imported 20 human gene 1.0 st CEL files into R, and I summarized them
> with rma(). I have then several concerns:
> Once the CEL files read, I have an oligo object with 1,102,500  
> features,
> and not 764,885 distinct probes mentioned in Affymetrix documentation.
> Once this R object summarized via the rma() function, I get 253,002
> features, instead of the 28,869 genes mentioned by Affymetrix. That
> suggests that only 4 probes on average are included in each probesets
> (roughly a 4:1 ratio between probes and summarized probesets). The
> median number of probes by probesets is supposed to be 26 with that
> specific technology.
> Does someone have an explanation or a comment on this issue?
> Many thanks.
> <ATT00001.txt>

More information about the Bioconductor mailing list