[BioC] package pair "hugene10stv1cdf"/"hugene10stprobeset.db"
James W. MacDonald
jmacdon at med.umich.edu
Mon May 3 17:05:09 CEST 2010
Hi Laurent,
Laurent Gautier wrote:
> Dear List,
>
> I am noting potential issues in the package pair
> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of
> probe set IDs are not overlapping:
>
> > library(hugene10stv1cdf)
> > library(hugene10stprobeset.db)
> > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL))
> Mode FALSE TRUE NA's
> logical 28026 4295 0
> > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf))
> Mode FALSE TRUE NA's
> logical 252727 4295 0
>
> Reading closely, one can observe that "hugene10stprobeset.db" refers to
> a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a revision
> 1. It is unclear to me whether this is linked to the problem, but if so
> then there is no hugene10stv5cdf, neither annotation for v1.
It's hard to say what the 'revision 5' refers to. There is only one
HuGene chip, and it is the version 1. There _have_ been nine versions of
the annotation file released by Affy (Releases 22-30), so there is no
telling what 'revision 5' refers to. But certainly it doesn't refer to a
HuGene-1_0-st-v5 chip, as no such thing exists.
I have a personal thesis that the Exon and Gene chips contain all manner
of extra sequences that Affy threw on there so they wouldn't have the
same problem they had with their 3'-biased chips. Namely that the chips
were out-of-date the minute they finished the first production run
because the annotations are so fluid. Now they can simply take the
original 32K probesets and slice-n-dice them at will to make things that
match up with the genome as we know it now.
But back to the point at hand. The problem with the hugene10stv1cdf is
it is based on the _unsupported_ cdf file that Affy makes available. We
make it available as well, for those who insist on using the
makecdfenv/affy pipeline, rather than the pdInfoBuilder/oligo pipeline,
which is what one should arguably be using. Given that the data being
used to create the cdf package is specifically unsupported, caveat emptor.
I note that the supported library files do contain an 'r4' in the file
name, so assume without any backing data that this library would
actually hew more closely to the annotation data they supply.
Best,
Jim
>
> The obligatory sessionInfo() is:
>
> > sessionInfo()
> R version 2.11.0 Patched (2010-04-24 r51813)
> i686-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] oligo_1.12.0 AffyCompatible_1.8.0
> [3] RCurl_1.4-1 bitops_1.0-4.1
> [5] XML_2.8-1 oligoClasses_1.10.0
> [7] limma_3.4.0 hugene10stv1cdf_2.6.0
> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1
> [11] RSQLite_0.8-4 DBI_0.2-5
> [13] AnnotationDbi_1.10.0 affxparser_1.20.0
> [15] affy_1.26.0 Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0
> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0
> [7] tools_2.11.0
> >
>
> Best,
>
>
> Laurent
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list