[BioC] 'ArrayExpress' creates invalid AffyBatch objects

Wolfgang Huber whuber at embl.de
Mon Feb 1 23:59:45 CET 2010


Dear Vlad

thank you for pointing this out. This seems to be a problem with the 
"ArrayExpress" function in the eponymous package when applied to an 
Affymetrix dataset where CEL files are available. assayData(ab) contains 
the 1,004,004 x 4 matrix of probe intensities from the 4 arrays, whereas 
featureData(ab) is the information on the 45,101 *probe sets* that 
ArrayExpress provides for the Mouse430_2 chip, i.e. A-AFFY-45.adf.txt.

As far as I can tell (and I think you found this too), this does not 
have an effect on downstream analysis (e.g. with affy::rma), since rma 
simply ignores the featureData.

Still I think it would be nice for the "ArrayExpress" function to 
produce Biobase-compliant objects, so we'll see what can be done about that.

	Thank you and best wishes
	Wolfgang



imir Morozov ha scritto:
> Hi,
> 
> There are the errors from object check
>>  validObject(ab)
> Error in validObject(ab) :
>   invalid class "AffyBatch" object: 1: feature numbers differ between assayData and featureData
> invalid class "AffyBatch" object: 2: featureNames differ between assayData and featureData
>> length(featureNames(assayData(ab)) )
> [1] 1004004
>> length(featureNames(featureData(ab)) )
> [1] 45101
> i got the same error messages with another experiment. I don't known how it effect downstream analysis. ExpressionSet object looks Ok
> 
> best
> Vlad
> 
> 
> 
>> package.version('ArrayExpress')
> [1] "1.6.1"
>> ab = ArrayExpress(input = "E-GEOD-2873")
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/index.html'
> Content type 'text/html;charset=ISO-8859-1' length unknown
> opened URL
> .......
> downloaded 7951 bytes
> 
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.raw.1.zip'
> Content type 'application/zip' length 19444999 bytes (18.5 Mb)
> opened URL
> ==================================================
> downloaded 18.5 Mb
> 
> Read 1 item
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.sdrf.txt'
> Content type 'text/plain' length 4505 bytes
> opened URL
> ==================================================
> downloaded 4505 bytes
> 
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/A-AFFY-45/A-AFFY-45.adf.txt'
> Content type 'text/plain' length 6430799 bytes (6.1 Mb)
> opened URL
> ==================================================
> downloaded 6.1 Mb
> 
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.idf.txt'
> Content type 'text/plain' length 4942 bytes
> opened URL
> ==================================================
> downloaded 4942 bytes
> 
> Read 50 items
> 
>  The object containing experiment  E-GEOD-2873  has been built.
> 
>>  validObject(ab)
> Error in validObject(ab) :
>   invalid class "AffyBatch" object: 1: feature numbers differ between assayData and featureData
> invalid class "AffyBatch" object: 2: featureNames differ between assayData and featureData
>> length(featureNames(assayData(ab)) )
> [1] 1004004
>> length(featureNames(featureData(ab)) )
> [1] 45101
>> ab
> AffyBatch object
> size of arrays=1002x1002 features (21998 kb)
> cdf=Mouse430_2 (45101 affyids)
> number of samples=4
> number of genes=45101
> annotation=mouse4302
> notes=E-GEOD-2873
>  E-GEOD-2873
>  organism_part
>  c("organism_part_comparison_design", "transcription profiling")
>  NULL
> 
>> eset=mas5(ab)
>>  validObject(eset)
> [1] TRUE
> 
>> sessionInfo()
> 
> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-unknown-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices datasets  utils     methods
> [8] base
> 
> other attached packages:
>  [1] mouse4302cdf_2.5.0  ArrayExpress_1.6.1  hgu133aprobe_2.5.0
>  [4] AnnotationDbi_1.8.1 hgu133acdf_2.5.0    simpleaffy_2.22.0
>  [7] gcrma_2.18.1        genefilter_1.28.2   reshape_0.8.3
> [10] plyr_0.1.9          affy_1.24.2         Biobase_2.6.1
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.14.0        annotate_1.24.1      Biostrings_2.14.10
>  [4] DBI_0.2-5            IRanges_1.4.10       limma_3.2.1
>  [7] preprocessCore_1.8.0 RSQLite_0.8-1        splines_2.10.1
> [10] survival_2.35-7      XML_2.6-0            xtable_1.5-6
> 
> 
> 
> Vladimir Morozov
> Sr. Computational Biologist
> ALS Therapy Development Institute
> 215 First Street, Cambridge MA, 02142
> Phone: 617-441-7242
> www.als.net<http://www.als.net/>
> Want to help stop ALS? Become an ALS Ambassador and take action. Learn more online at www.als.net/ambassador<http://www.als.net/ambassador>
> 
> 
> 
> ***************************************************************************************
> The information contained in this electronic message i...{{dropped:20}}



More information about the Bioconductor mailing list