[BioC] Create affyBatch from mouse exon array data using ReadAffy or extractAffyBatch() from aroma.affymetrix

cstrato cstrato at aon.at
Thu Jul 17 20:41:48 CEST 2008


Dear An

Mark was already so kind to answer your URL question.
Here is another link especially for mouse exon arrays:
http://www.affymetrix.com/support/technical/byproduct.affx?product=moexon-st
This link contains all files you need to download, as well as links
to the Affymetrix exon array dataset.

Besides the file ""script4xps.R" directory examples has another file,
namely "script4exon.R", which uses the Affymetrix human exon array
dataset as an example how to use xps with gene arrays and exon arrays.

Regarding your question, whether xps works with AffyBatches:
No, it has its own S4 class "DataTreeSet", which can be considered
to be a substitute for AffyBatch, since it includes many methods
also found in AffyBatch, e.g. exprs, se.exprs, pm, mm.

For exon arrays, it is not possible to create an AffyBatch, since
it requires cdfName, and xps uses pgf-files instead. Furthermore,
AffyBatch has the problem that ReadAffy() imports the data
from all CEL-files, which requires lots of RAM when importing
many exon arrays. Class DataTreeSet does not suffer this problem,
thus you can import many exon arrays on PCs with 1GB RAM only.

It is not quite clear to me, why you want to create an AffyBatch,
since usually you use the normalized data for further processing.
The normalized expression data are often saved as class
ExpressionSet, which you can easily create, as described in
Appendix A.3 of vignette "xps.pdf".

Please let me know if you have further questions.

Best regards
Christian


De Bondt, An-7114 [PRDBE] wrote:
> Thanks, Christian and Henrik, for your feedback!
>
>
> With respect to setting up the AffymetrixCelSet in aroma.affymetrix, I used checkChipType=FALSE because of the use of the alternative chipType (MmEx10stv1_Mm_ENSE in stead of MoEx-1_0-st-v1). If I use checkChipType=TRUE in this setup, I get the following:
>
> Error in list("AffymetrixCelSet$byName(projectName, chipType = chipType, checkChipType = T" = <environment>,  : 
>   
> [2008-07-17 08:25:21] Exception: Invalid name of directory containing CEL files. The name of the directory (MmEx10stv1_Mm_ENSE) must be the same as the chip type used for the CEL files (MoEx-1_0-st-v1) unless using argument 'checkChipType=FALSE': rawData/myDataSet/MmEx10stv1_Mm_ENSE
>   at throw(Exception(...))
>   at throw.default("Invalid name of directory containing CEL files. The name of 
>   at throw("Invalid name of directory containing CEL files. The name of the dire
>   at fromFiles.AffymetrixCelSet(static, path = path, cdf = cdf, ...)
>   at fromFiles(static, path = path, cdf = cdf, ...)
>   at withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarnin
>   at suppressWarnings({
>   at method(static, ...)
>   at AffymetrixCelSet$byName(projectName, chipType = chipType, checkChipType = T
>
>
>
> With respect to xps, your reference to script4xps.R is really helpful.  Do you have a URL from where the files "MoEx-1_0-st-v1.r2.clf", "MoEx-1_0-st-v1.r2.pgf", "MoEx-1_0-st-v1.na25.mm9.probeset.csv", "MoEx-1_0-st-v1.na25.mm9.transcript.csv" can be downloaded?  I searched on the Affy site but did not find it, sorry.  
> Does xps work with AffyBatches?  If not, is it possible to create an AffyBatch with the raw data?
>
>
>
> Best,
> An
>
>
>
> -----Original Message-----
> From: henrik.bengtsson at gmail.com [mailto:henrik.bengtsson at gmail.com]On
> Behalf Of Henrik Bengtsson
> Sent: Wednesday, 16 July 2008 21:48
> To: cstrato
> Cc: De Bondt, An-7114 [PRDBE]; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Create affyBatch from mouse exon array data using
> ReadAffy or extractAffyBatch() from aroma.affymetrix
>
>
> Hi,
>
> I never received the original message for this one - was it posted to
> BioC?  Anyway, my comments below.
>
> On Wed, Jul 16, 2008 at 11:59 AM, cstrato <cstrato at aon.at> wrote:
>   
>> Dear An
>>
>> I cannot answer your question regarding aroma.affymetrix,
>> but since you also mention "xps":
>>
>> Please note that xps can handle mouse exon arrays,
>> see the file "script4xps.R" in directory examples of how
>> to import the necessary clf, pgf and annotation files.
>>
>> Please let me know if you experience any problems.
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._
>> C.h.i.s.t.i.a.n S.t.r.a.t.o.w.a
>> V.i.e.n.n.a       A.u.s.t.r.i.a
>> e.m.a.i.l:    cstrato at aon.at
>> _._._._._._._._._._._._._._._._
>>
>> De Bondt, An-7114 [PRDBE] wrote:
>>     
>>> Dear UseRs,
>>>
>>> I am analysing a dataset from mouse exon arrays using aroma.affymetrix.  I
>>> can read the raw data using following code.
>>>
>>>      chipType <- "MmEx10stv1_Mm_ENSE"
>>>      cdf <- AffymetrixCdfFile$fromChipType(chipType = chipType)
>>>  # setup the CEL set; read the raw data
>>>      #==============
>>>      projectName <- "myDataSet"
>>>      cs <- AffymetrixCelSet$byName(projectName, chipType = chipType,
>>> checkChipType=FALSE, cdf = cdf)
>>>       
>
> Actually, that is not doing anything but setting up the
> AffymetrixCelSet.  It does not read in the data, except validating
> that the CEL files are consistent with each other (and the CDF).  BTW,
> you should only use 'checkChipType=FALSE', if you really know what you
> are doing; if it gives an error otherwise, there is often a good
> reason for it.
>
>   
>>> Next, I would like to make an AffyBatch from these raw data but I stumble
>>> at a memory message (see below).  This is the same message as directly
>>> affyBatchRaw <- ReadAffy(filenames =
>>> paste("./rawData/myDataSet/MmEx10stv1_Mm_ENSE/", celfiles, sep = ""))
>>>       
>
> I see that you previously/below tried:
>
>   affyBatchRaw <- extractAffyBatch(cs)
>
> which is pretty much the same as the above.  In aroma.affymetrix we
> use prefix "extract..." on method names to make it explicit that you
> load all data into memory and that any changes done on the obtained
> object will *not* be reflected in the underlying data files.
>
> Having said all this, what you do above is not really utilizing the
> aroma.affymetrix package at all.  All your problems are unrelated to
> that package and has to do with the 'affy' package.
>
> Your alternative is to do your exon analysis in 'aroma.affymetrix'
> (see online Vignettes), or use 'xps' as Christian suggests.
>
> Cheers
>
> Henrik
>
>   
>>> I am working on a linux machine with 70GB of memory.
>>>
>>>
>>> Did anyone experience this before?  Is this typical for mouse exon arrays?
>>>  I tried using exonmap as well as xps but, as far as I experienced, they are
>>> not yet adjusted for mouse exon arrays.
>>>
>>> Thanks in advance for your help!
>>>
>>> Kind regards,
>>> An
>>>
>>>
>>>
>>>
>>>       
>>>> affyBatchRaw <- extractAffyBatch(cs)
>>>>
>>>>         
>>> Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
>>>  :        Calloc could not allocate (-1889533886 of 48) memory
>>>
>>>
>>>       
>>>> traceback()
>>>>
>>>>         
>>> 5: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,
>>>  ref.cdfName, dim.intensity, verbose, PACKAGE = "affyio")
>>> 4: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
>>>  description = l$description, notes = notes, compress = compress,    rm.mask
>>> = rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra,    verbose =
>>> verbose, sd = sd, cdfname = cdfname)
>>> 3: ReadAffy(filenames = filenames, sampleNames = sampleNames, ...,
>>>  verbose = as.logical(verbose))
>>> 2: extractAffyBatch.AffymetrixCelSet(cs)
>>> 1: extractAffyBatch(cs)
>>>
>>>
>>>       
>>>> sessionInfo()
>>>>
>>>>         
>>> R version 2.6.2 (2008-02-08) x86_64-unknown-linux-gnu
>>> locale:
>>>
>>>  LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>>    [1] tools     stats     graphics  grDevices utils     datasets  methods
>>>  [8] base
>>> other attached packages:
>>>    [1] mmex10stv1mmensecdf_10.0.0 farms_1.3.1               [3]
>>> MASS_7.2-42                preprocessCore_1.0.0      [5] affyio_1.6.1
>>>         Biobase_1.16.3            [7] aroma.affymetrix_0.9.3
>>> aroma.apd_0.1.3           [9] R.huge_0.1.5               affy_1.16.0
>>>       [11] affxparser_1.10.2          aroma.core_0.9.3          [13]
>>> sfit_0.1.5                 aroma.light_1.8.1         [15] digest_0.3.1
>>>         matrixStats_0.1.2         [17] R.rsp_0.3.4
>>>  R.cache_0.1.7             [19] R.utils_1.0.2              R.oo_1.4.3
>>>          [21] R.methodsS3_1.0.1
>>> loaded via a namespace (and not attached):
>>>    [1] rcompgen_0.1-17
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
>
>
>
>



More information about the Bioconductor mailing list