[BioC] barcode with custom CDF

Steve Piccolo stephen.piccolo at hsc.utah.edu
Wed Mar 27 20:40:49 CET 2013

Hi Dario and Guido,

The UPC function in our SCAN.UPC package addresses this need. We use a
"single-sample" approach to estimating barcodes. Essentially this means
that we use the probe values within a given microarray sample to estimate
a background distribution and then use that information to estimate
whether each gene is "active" or "inactive" in that array. This is similar
in concept to the barcode function (fRMA package) except that it does not
require a large collection of reference samples, so it can easily be
applied to Affy arrays from any platform. We have performed a comparison
using the Affy Latin Square data, and our approached compares favorably to
the barcode function (manuscript in revision, we can send more details
offline if you're

It's also straightforward to use alternative CDFs, such as from
BrainArray. This functionality is described in the package's documentation.

One caveat: the UPC function is currently available only in the
"development" version of Bioconductor (it will be released to the main
version in a couple weeks). So if you want to try it out, you'll need to
install the development version of R and then the development version of

Please let us know if you have any questions!


>Message: 13
>Date: Tue, 26 Mar 2013 15:15:25 +0000
>From: "Hooiveld, Guido" <Guido.Hooiveld at wur.nl>
>To: "'Matthew McCall'" <mccallm at gmail.com>, "'Dario Greco'"
>	<dario.greco at ki.se>
>Cc: "'Bioconductor at r-project.org'" <bioconductor at r-project.org>
>Subject: Re: [BioC] barcode with custom CDF
>	<EB992C246EB7BF449BC1E6B12AF7F65007DB84C0 at SCOMP0933.wurnet.nl>
>Content-Type: text/plain; charset="us-ascii"
>Hi Matt,
>Sorry to interfere with this specific discussion, but i would also be
>interested in your suggestions on potential alternative approaches.
>The reason I am interested is because ideally I would like to apply a
>(your) barcoding approach for platforms that are less used compared to
>the HGU133 or MOE430 platforms, such as the HuGene and MoGene ST v1.x
>-----Original Message-----
>From: bioconductor-bounces at r-project.org
>[mailto:bioconductor-bounces at r-project.org] On Behalf Of Matthew McCall
>Sent: Tuesday, March 26, 2013 15:59
>To: Dario Greco
>Cc: Bioconductor at r-project.org
>Subject: Re: [BioC] barcode with custom CDF
>For the barcode implementations in BioC, I used > 10,000 arrays from each
>platform. I doubt this amount of data is available for all 8 Affy
>platforms you're using. If you don't mind giving me a brief overview of
>your research goals for this project (not cc'ing the BioC mailing list if
>you're more comfortable with that), I might be able to provide some
>alternatives to a full barcode implementation.
>On Tue, Mar 26, 2013 at 10:47 AM, Dario Greco <dario.greco at ki.se> wrote:
>> Dear Matt,
>> thanks a lot for the quick reply!
>> i'm working on data from 8 homo sapiens affymetrix platforms
>>re-annotated with brainarray cdf (ensembl gene).
>> i can have access to relatively large computer clusters, so that is not
>>worrying me.
>> the most obvious question is probably concerning what volume of data
>>from chipsets other than 133a and 133p2 i would need in order to
>>generate meaningful estimations.
>> thanks
>> d
>> On Mar 26, 2013, at 2:43 PM, Matthew McCall <mccallm at gmail.com> wrote:
>>> Dario,
>>> Generating the barcode vectors (estimating the null distribution for
>>> each probeset) typically isn't something one can run on a laptop. It
>>> takes about 1-2 days running in parallel on about 20 nodes of a
>>> computing cluster. If you have access to such resources, I'm happy to
>>> help you create your own implementation. Is the custom CDF you're
>>> using one of the Brain Array CDFs or something of your own design?
>>> Best,
>>> Matt
>>> On Tue, Mar 26, 2013 at 7:03 AM, Dario Greco [guest]
>>> <guest at bioconductor.org> wrote:
>>>> Dear BioC-ers,
>>>> I would like to run the function 'barcode' on a set of CEL files
>>>>preprocessed with a custom CDF.
>>>> I am wondering if there is a quick way to generate the needed vectors
>>>>(mu and tau for the unexpressed distribution) in the same way as the
>>>>package frmaTools allows for the fRMA necessary vectors.
>>>> I hope I am not posting about an issue already treated in this
>>>>mailing list, but searching it produced no obvious hints.
>>>> thanks a lot for your help and suggestions.
>>>> cheers
>>>> dario
>>>> -- output of sessionInfo():
>>>> sessionInfo()
>>>> R version 2.15.3 (2013-03-01)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> other attached packages:
>>>> [1] hgu133plus2barcodevecs_1.0.5 hgu133plus2frmavecs_1.1.12
>>>> [3] hgu133abarcodevecs_1.0.5     hthgu133acdf_2.11.0
>>>> [5] AnnotationDbi_1.20.7         affy_1.36.1
>>>> [7] frma_1.10.0                  Biobase_2.18.0
>>>> [9] BiocGenerics_0.4.0           BiocInstaller_1.8.3
>>>> loaded via a namespace (and not attached):
>>>> [1] affxparser_1.30.2     affyio_1.26.0         Biostrings_2.26.3
>>>> [4] bit_1.1-10            codetools_0.2-8       DBI_0.2-5
>>>> [7] ff_2.2-11             foreach_1.4.0         GenomicRanges_1.10.7
>>>> [10] IRanges_1.16.6        iterators_1.0.6       MASS_7.3-23
>>>> [13] oligo_1.22.0          oligoClasses_1.20.0   parallel_2.15.3
>>>> [16] preprocessCore_1.20.0 RSQLite_0.11.2        splines_2.15.3
>>>> [19] stats4_2.15.3         tools_2.15.3          zlibbioc_1.4.0
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>> --
>>> Matthew N McCall, PhD
>>> 112 Arvine Heights
>>> Rochester, NY 14611
>>> Cell: 202-222-5880
>Matthew N McCall, PhD
>112 Arvine Heights
>Rochester, NY 14611
>Cell: 202-222-5880
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives:

More information about the Bioconductor mailing list