[BioC] 'ArrayExpress' creates invalid AffyBatch objects
Wolfgang Huber
whuber at embl.de
Mon Feb 1 23:59:45 CET 2010
Dear Vlad
thank you for pointing this out. This seems to be a problem with the
"ArrayExpress" function in the eponymous package when applied to an
Affymetrix dataset where CEL files are available. assayData(ab) contains
the 1,004,004 x 4 matrix of probe intensities from the 4 arrays, whereas
featureData(ab) is the information on the 45,101 *probe sets* that
ArrayExpress provides for the Mouse430_2 chip, i.e. A-AFFY-45.adf.txt.
As far as I can tell (and I think you found this too), this does not
have an effect on downstream analysis (e.g. with affy::rma), since rma
simply ignores the featureData.
Still I think it would be nice for the "ArrayExpress" function to
produce Biobase-compliant objects, so we'll see what can be done about that.
Thank you and best wishes
Wolfgang
imir Morozov ha scritto:
> Hi,
>
> There are the errors from object check
>> validObject(ab)
> Error in validObject(ab) :
> invalid class "AffyBatch" object: 1: feature numbers differ between assayData and featureData
> invalid class "AffyBatch" object: 2: featureNames differ between assayData and featureData
>> length(featureNames(assayData(ab)) )
> [1] 1004004
>> length(featureNames(featureData(ab)) )
> [1] 45101
> i got the same error messages with another experiment. I don't known how it effect downstream analysis. ExpressionSet object looks Ok
>
> best
> Vlad
>
>
>
>> package.version('ArrayExpress')
> [1] "1.6.1"
>> ab = ArrayExpress(input = "E-GEOD-2873")
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/index.html'
> Content type 'text/html;charset=ISO-8859-1' length unknown
> opened URL
> .......
> downloaded 7951 bytes
>
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.raw.1.zip'
> Content type 'application/zip' length 19444999 bytes (18.5 Mb)
> opened URL
> ==================================================
> downloaded 18.5 Mb
>
> Read 1 item
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.sdrf.txt'
> Content type 'text/plain' length 4505 bytes
> opened URL
> ==================================================
> downloaded 4505 bytes
>
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/A-AFFY-45/A-AFFY-45.adf.txt'
> Content type 'text/plain' length 6430799 bytes (6.1 Mb)
> opened URL
> ==================================================
> downloaded 6.1 Mb
>
> trying URL 'http://www.ebi.ac.uk/microarray-as/ae/files/E-GEOD-2873/E-GEOD-2873.idf.txt'
> Content type 'text/plain' length 4942 bytes
> opened URL
> ==================================================
> downloaded 4942 bytes
>
> Read 50 items
>
> The object containing experiment E-GEOD-2873 has been built.
>
>> validObject(ab)
> Error in validObject(ab) :
> invalid class "AffyBatch" object: 1: feature numbers differ between assayData and featureData
> invalid class "AffyBatch" object: 2: featureNames differ between assayData and featureData
>> length(featureNames(assayData(ab)) )
> [1] 1004004
>> length(featureNames(featureData(ab)) )
> [1] 45101
>> ab
> AffyBatch object
> size of arrays=1002x1002 features (21998 kb)
> cdf=Mouse430_2 (45101 affyids)
> number of samples=4
> number of genes=45101
> annotation=mouse4302
> notes=E-GEOD-2873
> E-GEOD-2873
> organism_part
> c("organism_part_comparison_design", "transcription profiling")
> NULL
>
>> eset=mas5(ab)
>> validObject(eset)
> [1] TRUE
>
>> sessionInfo()
>
> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tools stats graphics grDevices datasets utils methods
> [8] base
>
> other attached packages:
> [1] mouse4302cdf_2.5.0 ArrayExpress_1.6.1 hgu133aprobe_2.5.0
> [4] AnnotationDbi_1.8.1 hgu133acdf_2.5.0 simpleaffy_2.22.0
> [7] gcrma_2.18.1 genefilter_1.28.2 reshape_0.8.3
> [10] plyr_0.1.9 affy_1.24.2 Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.14.0 annotate_1.24.1 Biostrings_2.14.10
> [4] DBI_0.2-5 IRanges_1.4.10 limma_3.2.1
> [7] preprocessCore_1.8.0 RSQLite_0.8-1 splines_2.10.1
> [10] survival_2.35-7 XML_2.6-0 xtable_1.5-6
>
>
>
> Vladimir Morozov
> Sr. Computational Biologist
> ALS Therapy Development Institute
> 215 First Street, Cambridge MA, 02142
> Phone: 617-441-7242
> www.als.net<http://www.als.net/>
> Want to help stop ALS? Become an ALS Ambassador and take action. Learn more online at www.als.net/ambassador<http://www.als.net/ambassador>
>
>
>
> ***************************************************************************************
> The information contained in this electronic message i...{{dropped:20}}
More information about the Bioconductor
mailing list