[BioC] package pair "hugene10stv1cdf"/"hugene10stprobeset.db"

Mon May 3 17:05:09 CEST 2010

Hi Laurent,

Laurent Gautier wrote:
> Dear List,
> 
> I am noting potential issues in the package pair  
> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of 
> probe set IDs are not overlapping:
> 
>  > library(hugene10stv1cdf)
>  > library(hugene10stprobeset.db)
>  > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL))
>    Mode   FALSE    TRUE    NA's
> logical   28026    4295       0
>  > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf))
>    Mode   FALSE    TRUE    NA's
> logical  252727    4295       0
> 
> Reading closely, one can observe that "hugene10stprobeset.db" refers to 
> a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a revision 
> 1. It is unclear to me whether this is linked to the problem, but if so 
> then there is no hugene10stv5cdf, neither annotation for v1.

It's hard to say what the 'revision 5' refers to. There is only one 
HuGene chip, and it is the version 1. There _have_ been nine versions of 
the annotation file released by Affy (Releases 22-30), so there is no 
telling what 'revision 5' refers to. But certainly it doesn't refer to a 
HuGene-1_0-st-v5 chip, as no such thing exists.

I have a personal thesis that the Exon and Gene chips contain all manner 
of extra sequences that Affy threw on there so they wouldn't have the 
same problem they had with their 3'-biased chips. Namely that the chips 
were out-of-date the minute they finished the first production run 
because the annotations are so fluid. Now they can simply take the 
original 32K probesets and slice-n-dice them at will to make things that 
  match up with the genome as we know it now.

But back to the point at hand. The problem with the hugene10stv1cdf is 
it is based on the _unsupported_ cdf file that Affy makes available. We 
make it available as well, for those who insist on using the 
makecdfenv/affy pipeline, rather than the pdInfoBuilder/oligo pipeline, 
which is what one should arguably be using. Given that the data being 
used to create the cdf package is specifically unsupported, caveat emptor.

I note that the supported library files do contain an 'r4' in the file 
name, so assume without any backing data that this library would 
actually hew more closely to the annotation data they supply.

Best,

Jim

> 
> The obligatory sessionInfo() is:
> 
>  > sessionInfo()
> R version 2.11.0 Patched (2010-04-24 r51813)
> i686-pc-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] oligo_1.12.0                AffyCompatible_1.8.0
>  [3] RCurl_1.4-1                 bitops_1.0-4.1
>  [5] XML_2.8-1                   oligoClasses_1.10.0
>  [7] limma_3.4.0                 hugene10stv1cdf_2.6.0
>  [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1
> [11] RSQLite_0.8-4               DBI_0.2-5
> [13] AnnotationDbi_1.10.0        affxparser_1.20.0
> [15] affy_1.26.0                 Biobase_2.8.0
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0         Biostrings_2.16.0     IRanges_1.6.0
> [4] preprocessCore_1.10.0 splines_2.11.0        tcltk_2.11.0
> [7] tools_2.11.0
>  >
> 
> Best,
> 
> 
> Laurent
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues