[BioC] Analysing Human Gene ST 1.0 Arrays with oligo and oneChannelGUI yield different number of probesets

Javier Pérez Florido jpflorido at gmail.com
Fri Oct 30 13:41:35 CET 2009


Dear Benilton,
Thanks for your help. I have more questions. What is the summarization 
at gene-level? I thought that a probeset = gene.
The "new probesets" defined in the MPS file, are related to the 
experiment or are they controls?

Two more things:
- I would like to perform an analysis without the control genes. How may 
I know which genes are controls to remove them from the analysis?
- What is the best package available for 
annotation?hugene10stprobeset.db? I suppose that using the featureNames 
of the expression set, I can get the ENTREZID of the probesets through 
this annotation package.

Thanks again,
Javier


Benilton Carvalho escribió:
> That makes me think that I forgot one 'svn commit' sometime in the 
> past... Apologies for that.
>
> In the meantime, please use the following description.
>
> Until BioC 2.4, oligo summarized only to the probeset level (as 
> defined in the PGF file). Affymetrix made available meta-probeset 
> files (MPS) that define "new probesets", which allow summarization to 
> the gene-level. For exon arrays, there are 3 MPSs (depending on the 
> quality): core (best), extended and full. For gene arrays, there's 
> only "core" MPS.
>
> Therefore, summaries to the gene level should use this additional 
> annotation.
>
> So, using the 'target' argument, you can set to what level you want 
> the summarization to be: "probeset", "core", "extended" and "full" are 
> the possible values (this is available starting now on BioC 2.5).
>
> I'll make sure the documentation is updated soon to reflect this change.
>
> Once again, apologies.
>
> b
>
> On Oct 29, 2009, at 8:21 PM, Javier Pérez Florido wrote:
>
>> Dear Benilton,
>> Thanks for your quick reply. Now, it works with the target argument.
>> However, I searched on the web for the meaning of this argument and
>> couldn't find anything. What is "target" for?
>> Why does oligo's manual say: "The ExpressionSet returned when either
>> Exon/Gene-FeatureSet objects are passed contain extra annotation on the
>> featureData slot that the user should take into account for
>> exon/gene-level analyses"?
>> I didn't work with Human Gene ST arrays before, so, I quite new on this
>> topic.
>> Thanks again,
>> Javier
>>
>>
>>
>>
>>
>> Benilton Carvalho escribió:
>>> Dear Javier,
>>>
>>> You have not provided the exact call to RMA you used nor your
>>> sessionInfo() information.
>>>
>>> If you're using the latest oligo (BioC 2.5), you can call:
>>>
>>> results = rma(object, target="core")
>>>
>>> to get the 33297 "probesets" you refer to...
>>>
>>> Note that building the package yourself is a nice exercise, but you
>>> could just download it via biocLite().
>>>
>>> Cheers,
>>>
>>> b
>>>
>>> On Oct 29, 2009, at 5:42 PM, Javier Pérez Florido wrote:
>>>
>>>> Dear list,
>>>> Some time ago I analysed a set of Human Gene ST Arrays with
>>>> oneChannelGUI. Now I'm trying to reproduce the results using oligo
>>>> package but I am quite surprised with the results obtained. With oligo
>>>> package, after preprocessing with rma, the number of probesets are
>>>> 253002 while with oneChannelGUI the number of probesets are 33297, and
>>>> the CEL files are the same!!!
>>>>
>>>> For oligo package, and prior to read the CEL files,  I had to build 
>>>> the
>>>> annotation package using pdInfoPackage, since the CDF file is not
>>>> supported by Affymetrix. For this purpose, first I had to download the
>>>> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix
>>>> website. The necessary files for building the package are:
>>>> HuGene-1_0-st-v1.r4.pgf
>>>> HuGene-1_0-st-v1.r4.clf
>>>> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file)
>>>>
>>>> Then, I executed the following commands:
>>>> library(pdInfoBuilder)
>>>> baseDir <- "pathWhereTheFilesAre"
>>>> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE))
>>>> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE))
>>>> (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names =
>>>> TRUE))
>>>> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile =
>>>> clf,probeFile = prob, author = "Javier",email = "email",biocViews =
>>>> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human",
>>>> species = "Homo Sapiens",url = "")
>>>> makePdInfoPackage(seed, destDir = ".")
>>>>
>>>> And I installed the package:
>>>> R CMD INSTALL pd.hugene.1.0.st.v1\
>>>>
>>>> The package was installed OK and I read and preprocessed the CEL files
>>>> using RMA, but the number of probesets are 253002!!!! So many 
>>>> probesets
>>>> compared to the ones given by oneChannelGUI.
>>>>
>>>> Any comments for such big difference??
>>>> Thanks,
>>>> Javier
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>
>



More information about the Bioconductor mailing list