[BioC] barcode with custom CDF

Matthew McCall mccallm at gmail.com
Tue Mar 26 16:39:35 CET 2013


No worries. It depends on what you plan to do with the data. One
option is to go back to the method described in the original barcode
paper (Zilliox and Irizarry Nat Methods 2007), which discards any
genes that don't show a bimodal distribution. Another option is to
define the null distribution based on your specific data set -- e.g.
you estimate the null distribution for each gene using say 50
untreated samples and then use that distribution to "barcode" treated
samples (this is similar to the POE algorithm --
http://astor.som.jhmi.edu/poe/). There are other options as well. The
reason the barcode implementations I make require so many arrays is
that we are trying to perform well regardless of what the researcher
is interested in -- we give a bunch of examples of how to use the
barcode algorithm for various tasks in the NAR 2011 paper.

As for a HuGene and MoGene ST barcode implementation -- I'm working on
this and hope to have something by the fall BioC release.


On Tue, Mar 26, 2013 at 11:15 AM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
> Hi Matt,
> Sorry to interfere with this specific discussion, but i would also be interested in your suggestions on potential alternative approaches.
> The reason I am interested is because ideally I would like to apply a (your) barcoding approach for platforms that are less used compared to the HGU133 or MOE430 platforms, such as the HuGene and MoGene ST v1.x arrays.
> Regards,
> Guido
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Matthew McCall
> Sent: Tuesday, March 26, 2013 15:59
> To: Dario Greco
> Cc: Bioconductor at r-project.org
> Subject: Re: [BioC] barcode with custom CDF
> Dario,
> For the barcode implementations in BioC, I used > 10,000 arrays from each platform. I doubt this amount of data is available for all 8 Affy platforms you're using. If you don't mind giving me a brief overview of your research goals for this project (not cc'ing the BioC mailing list if you're more comfortable with that), I might be able to provide some alternatives to a full barcode implementation.
> Best,
> Matt
> On Tue, Mar 26, 2013 at 10:47 AM, Dario Greco <dario.greco at ki.se> wrote:
>> Dear Matt,
>> thanks a lot for the quick reply!
>> i'm working on data from 8 homo sapiens affymetrix platforms re-annotated with brainarray cdf (ensembl gene).
>> i can have access to relatively large computer clusters, so that is not worrying me.
>> the most obvious question is probably concerning what volume of data from chipsets other than 133a and 133p2 i would need in order to generate meaningful estimations.
>> thanks
>> d
>> On Mar 26, 2013, at 2:43 PM, Matthew McCall <mccallm at gmail.com> wrote:
>>> Dario,
>>> Generating the barcode vectors (estimating the null distribution for
>>> each probeset) typically isn't something one can run on a laptop. It
>>> takes about 1-2 days running in parallel on about 20 nodes of a
>>> computing cluster. If you have access to such resources, I'm happy to
>>> help you create your own implementation. Is the custom CDF you're
>>> using one of the Brain Array CDFs or something of your own design?
>>> Best,
>>> Matt
>>> On Tue, Mar 26, 2013 at 7:03 AM, Dario Greco [guest]
>>> <guest at bioconductor.org> wrote:
>>>> Dear BioC-ers,
>>>> I would like to run the function 'barcode' on a set of CEL files preprocessed with a custom CDF.
>>>> I am wondering if there is a quick way to generate the needed vectors (mu and tau for the unexpressed distribution) in the same way as the package frmaTools allows for the fRMA necessary vectors.
>>>> I hope I am not posting about an issue already treated in this mailing list, but searching it produced no obvious hints.
>>>> thanks a lot for your help and suggestions.
>>>> cheers
>>>> dario
>>>> -- output of sessionInfo():
>>>> sessionInfo()
>>>> R version 2.15.3 (2013-03-01)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> other attached packages:
>>>> [1] hgu133plus2barcodevecs_1.0.5 hgu133plus2frmavecs_1.1.12
>>>> [3] hgu133abarcodevecs_1.0.5     hthgu133acdf_2.11.0
>>>> [5] AnnotationDbi_1.20.7         affy_1.36.1
>>>> [7] frma_1.10.0                  Biobase_2.18.0
>>>> [9] BiocGenerics_0.4.0           BiocInstaller_1.8.3
>>>> loaded via a namespace (and not attached):
>>>> [1] affxparser_1.30.2     affyio_1.26.0         Biostrings_2.26.3
>>>> [4] bit_1.1-10            codetools_0.2-8       DBI_0.2-5
>>>> [7] ff_2.2-11             foreach_1.4.0         GenomicRanges_1.10.7
>>>> [10] IRanges_1.16.6        iterators_1.0.6       MASS_7.3-23
>>>> [13] oligo_1.22.0          oligoClasses_1.20.0   parallel_2.15.3
>>>> [16] preprocessCore_1.20.0 RSQLite_0.11.2        splines_2.15.3
>>>> [19] stats4_2.15.3         tools_2.15.3          zlibbioc_1.4.0
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>> --
>>> Matthew N McCall, PhD
>>> 112 Arvine Heights
>>> Rochester, NY 14611
>>> Cell: 202-222-5880
> --
> Matthew N McCall, PhD
> 112 Arvine Heights
> Rochester, NY 14611
> Cell: 202-222-5880
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Matthew N McCall, PhD
112 Arvine Heights
Rochester, NY 14611
Cell: 202-222-5880

More information about the Bioconductor mailing list