[BioC] frmaTools library error :: makeVectorsAffyBatch() function
Matthew McCall
mccallm at gmail.com
Mon Jun 3 15:33:57 CEST 2013
It depends a lot on what you plan to do with the preprocessed data. If
you wanted to convert gene expression values to gene expression
barcodes (see McCall et al. NAR 2011) or anything else where you
compare your data to other data sets, then you should just stick with
the default fRMA. If this is a pilot study, and you'll eventually have
a much larger number of samples, then making your own fRMA vectors
might make sense -- here you would want the batch variable to be some
combination of lab technician, reagent batch, scan date, etc. Did one
person really run 141 arrays on one day? If not, then you don't need
to create artificial batches in your data, there are real batch
variables that you could consider. Finally, if you are really only
interested in looking at these 141 arrays and none of the things I
mentioned previously apply, then why not just use RMA?
As the error message says, you need to create batches of equal size
when creating fRMA vectors. This can actually provide a nice test of
how stable your frozen parameters are -- let's say you have batch
sizes of 12, 20, 15, and 19, then you could randomly choose 10 arrays
from each batch and create your fRMA vectors. You could then repeat
this process many times and see how much your frozen parameters
change. The following article has a more detailed discussion of the
issues in creating fRMA vectors:
McCall MN* and Irizarry RA (2011). Thawing Frozen Robust Multi-array
Analysis (fRMA), BMC Bioinformatics, 12:369.
Best,
Matt
On Sun, Jun 2, 2013 at 11:00 PM, Tae-Hoon Chung <hoontaechung at gmail.com> wrote:
> Dear Matthew,
>
> Thanks for your reply.
>
> I know there's a general purpose library hgu133plus2frmavecs that was
> produced using microarray data from diverse tissues and batches.
> However, I am interested in a specific tissue only and so don't need to
> consider probes that may not be reliable in it.
> So I want to know if I need my own version of normalisation parameters based
> solely on data from tissue of interest or I can simply rely on general
> purpose library at any rate.
> I've already processed the data using the general purpose library.
> And now it's the turn to develop my own version of normalisation parameters
> myself and I came across the error.
>
> In fact, I tried the procedure with artificially designated two batches.
> Interestingly, it also produced an error indicating that the batches were
> not of the same size (I split the sample into two batches of sizes 71 and
> 70, respectively).
> Any suggestion?
>
> Regards,
> TH
>
> 2013년 5월 31일 금요일에 Matthew McCall<mccallm at gmail.com>님이 작성:
>
>> The issue is that you are trying to estimate between-batch residual
>> variances with only 1 batch:
>> abatch.ref <- makeVectorsAffyBatch(files.ref,
>> rep(1,length(files.ref)), file.dir=FILED)
>>
>> I'm curious why you are making your own fRMA vectors. HGU133plus2 has
>> pre-made frozen parameter vectors in the hgu133plus2frmavecs package,
>> so you could use those to preprocess your data.
>>
>> Best,
>> Matt
>>
>> On Fri, May 31, 2013 at 2:40 AM, Tae-Hoon Chung <hoontaechung at gmail.com>
>> wrote:
>>> Hi, all;
>>>
>>> I encountered the following error while using makeVectorsAffyBatch()
>>> function in frmaTools library on 141 Affymetrix HG-U133 Plus 2 chips
>>> (single batch).
>>>
>>>
>>> abatch.ref <- makeVectorsAffyBatch(files.ref, rep(1,length(files.ref)),
>>> file.dir=FILED)
>>> 1 reading GSM493958.CEL ...instantiating an AffyBatch (intensity a
>>> 1354896x141 matrix)...done.
>>> Reading in : GSM493958.CEL
>>> Reading in : GSM493960.CEL
>>> Reading in : GSM493965.CEL
>>> Reading in : GSM493966.CEL
>>> Reading in : GSM493970.CEL
>>>
>>> …
>>>
>>> Reading in : GSM494235.CEL
>>> Reading in : GSM494237.CEL
>>> Reading in : GSM494240.CEL
>>> Data loaded
>>>
>>> Background Corrected
>>>
>>> Normalized
>>>
>>> Beginning Probe Effect Calculation ...
>>>
>>> Finished probeset: 1000
>>>
>>> Finished probeset: 2000
>>>
>>> Finished probeset: 3000
>>>
>>> Finished probeset: 4000
>>>
>>> …
>>>
>>> Finished probeset: 53000
>>>
>>> Finished probeset: 54000
>>>
>>> Probe Effects Calculated
>>>
>>> Probe Variances Calculated
>>>
>>> Probe Set SDs Calculated
>>>
>>> Beginning Median SE Calculation ...
>>>
>>> Error in if (any(w < 0)) { : missing value where TRUE/FALSE needed
>>>
>>>
>>> Any suggestion?
>>> I ran the code under following environment:
>>>
>>>
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] hgu133plus2cdf_2.11.0 AnnotationDbi_1.20.7 frmaTools_1.10.0
>>> [4] frma_1.10.0 affy_1.36.1 Biobase_2.18.0
>>> [7] BiocGenerics_0.4.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BiocInstaller_1.8.3 Biostrings_2.26.3 DBI_0.2-5
>>> [4] GenomicRanges_1.10.7 IRanges_1.16.6 MASS_7.3-23
>>> [7] RSQLite_0.11.2 affxparser_1.30.2 affyio_1.26.0
>>> [10] bit_1.1-10 codetools_0.2-8 ff_2.2-11
>>> [13] foreach_1.4.0 iterators_1.0.6 oligo_1.22.0
>>> [16] oligoClasses_1.20.0 parallel_2.15.2 preprocessCore_1.20.0
>>> [19] splines_2.15.2 stats4_2.15.2 tools_2.15.2
>>> [22] zlibbioc_1.4.0
>>>
>>> Thanks in advance,
>>>
>>> TH
>>>
>>>
>>> --
>>> Tae-Hoon Chung, PhD
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> --
>> Matthew N McCall, PhD
>> 112 Arvine Heights
>> Rochester, NY 14611
>> Cell: 202-222-5880
>>
>
> --
> Tae-Hoon Chung, PhD
>
--
Matthew N McCall, PhD
112 Arvine Heights
Rochester, NY 14611
Cell: 202-222-5880
More information about the Bioconductor
mailing list