[Bioc-devel] scanTabix coercion to data.frame

Martin Morgan mtmorgan at fhcrc.org
Mon Apr 16 22:17:53 CEST 2012


On 04/16/2012 12:31 AM, Hahne, Florian wrote:
> My bad, I updated all packages before trying this and never checked what
> actually happened.
> The odd thing is that I am running R-devel, I have the latest
> BiocInstaller 1.5.6 installed but I still only get the bioc release
> packages.:
>  > sessionInfo()
> R Under development (unstable) (2012-04-16 r59045)
> Platform: x86_64-unknown-linux-gnu/x86_64 (64-bit)

R switched to an annual release cycle, whereas Bioc kept it's 
semi-annual release. Bioc during April - October uses 'release' R for 
both release and devel Bioc. So BiocInstaller was expecting you to have 
R-2-15 regardless of whether you were 'release' or 'devel' bioc.

You can manage the two versions either with duplicate copies of R-2-15 
installed in different locations, or using the R_LIBS_USER (for example) 
environment variable to point to a user library that is different for 
Bioc release and for Bioc devel.

Martin


>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.16.1 GenomicRanges_1.8.3 IRanges_1.14.2
> [4] BiocGenerics_0.2.0 BiocInstaller_1.5.6
>
> loaded via a namespace (and not attached):
> [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1
> [5] Rsamtools_1.9.2 stats4_2.16.0 tools_2.16.0 XML_3.9-4
> [9] zlibbioc_1.2.0
>
> I think I followed Dan's instructions carefully, any idea why this is
> not working for me?
>
> The little bit of debugging I tried revealed that biocInstallRepos does
> not give me the right repository path:
>  > biocinstallRepos()
> BioCsoft
> "http://www.bioconductor.org/packages/2.10/bioc"
> CRAN
> "http://cran.fhcrc.org"
> BioCann
> "http://www.bioconductor.org/packages/2.10/data/annotation"
> BioCexp
> "http://www.bioconductor.org/packages/2.10/data/experiment"
> BioCextra
> "http://www.bioconductor.org/packages/2.10/extra"
>
> Now in there I find:
>  > BiocInstaller:::biocinstallRepos
> function (siteRepos = character())
> {
> .biocinstallRepos(siteRepos = siteRepos, devel = .isDevel())
> }
> <environment: namespace:BiocInstaller>
>
> And .isDevel is defined as
>
>  > BiocInstaller:::.isDevel
> function ()
> {
> isOdd <- (packageVersion("BiocInstaller")$minor%%2L) == 1L
> isOdd && (R.version$status == "" || R.version$status == "Patched")
> }
> <environment: namespace:BiocInstaller>
>
> I may be wrong here, but how can I ever get TRUE unless I am running R
> Patched or whatever R.version$status=="" refers to? Since I am running R
> devel built from svn I have
>  > R.version$status
> [1] "Under development (unstable)"
>
> So I will always and for all eternity get .isDevel()==FALSE…
>
> Florian
>
> Florian Hahne
> Novartis Institute For Biomedical Research
> Translational Sciences / Preclinical Safety / PCS Informatics
> Expert Data Integration and Modeling Bioinformatics
> CHBS, WKL-135.2.26
> Novartis Institute For Biomedical Research, Werk Klybeck
> Klybeckstrasse 141
> CH-4057 Basel
> Switzerland
> Phone: +41 61 6967127
> Email : florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>
>
>
> From: Michael Lawrence <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>
> Date: Fri, 13 Apr 2012 09:19:20 -0700
> To: NIBR <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>>
> Cc: Michael Lawrence <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>, Sean Davis <sdavis2 at mail.nih.gov
> <mailto:sdavis2 at mail.nih.gov>>, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>>, "bioc-devel at r-project.org
> <mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org
> <mailto:bioc-devel at r-project.org>>
> Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>
>
>
> On Fri, Apr 13, 2012 at 8:15 AM, Hahne, Florian
> <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>> wrote:
>
>     Yes, I tried this:
>     ff <-
>     TabixFile("/CHBS/apps/itox/data/project_data_repository/1/1/project.tbx")
>     foo <- import(ff)
>     Error: evaluation nested too deeply: infinite recursion /
>     options(expressions=)?
>
>     And this:
>
>
>     foo <- import(ff, which=GRanges(seqnames="chrX", ranges=IRanges(start=1,
>     end=1e8)))
>     Error: evaluation nested too deeply: infinite recursion /
>     options(expressions=)?
>
>     And then I gave up :-)
>
>
>
> Ok, well I said the devel version, i.e., 1.17.1, not 1.16.1.
>
>     >  sessionInfo()
>     R Under development (unstable) (2012-04-03 r58904)
>     Platform: x86_64-unknown-linux-gnu/x86_64 (64-bit)
>
>     locale:
>     [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>     [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>     [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>     [7] LC_PAPER=C LC_NAME=C
>     [9] LC_ADDRESS=C LC_TELEPHONE=C
>     [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] stats graphics grDevices utils datasets methods base
>
>     other attached packages:
>     [1] rtracklayer_1.16.1 Rsamtools_1.9.2 Biostrings_2.24.1
>     [4] GenomicRanges_1.8.3 IRanges_1.14.2 BiocGenerics_0.2.0
>     [7] BiocInstaller_1.4.3
>
>     loaded via a namespace (and not attached):
>     [1] bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 stats4_2.16.0
>     [5] tools_2.16.0 XML_3.9-4 zlibbioc_1.2.0
>
>
>
>
>
>
>
>     Florian Hahne
>     Novartis Institute For Biomedical Research
>     Translational Sciences / Preclinical Safety / PCS Informatics
>     Expert Data Integration and Modeling Bioinformatics
>     CHBS, WKL-135.2.26
>     Novartis Institute For Biomedical Research, Werk Klybeck
>     Klybeckstrasse 141
>     CH-4057 Basel
>     Switzerland
>     Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>     Email : florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>
>
>
>
>
>
>
>
>     From: Michael Lawrence <lawrence.michael at gene.com
>     <mailto:lawrence.michael at gene.com>>
>     Date: Thu, 12 Apr 2012 10:07:31 -0700
>     To: NIBR <florian.hahne at novartis.com
>     <mailto:florian.hahne at novartis.com>>
>     Cc: Michael Lawrence <lawrence.michael at gene.com
>     <mailto:lawrence.michael at gene.com>>, Sean Davis
>     <sdavis2 at mail.nih.gov <mailto:sdavis2 at mail.nih.gov>>, Martin Morgan
>     <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>,
>     "bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>"
>     <bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>>
>     Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>
>
>     Did you try the latest devel version?
>
>     On Thu, Apr 12, 2012 at 9:29 AM, Hahne, Florian
>     <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>> wrote:
>
>     Thanks, I gave it a shot and got this:
>     Error: evaluation nested too deeply: infinite recursion /
>     options(expressions=)?
>
>
>     Guess I'll stick with scanTabix for now :-)
>     Florian
>     Florian Hahne
>     Novartis Institute For Biomedical Research
>     Translational Sciences / Preclinical Safety / PCS Informatics
>     Expert Data Integration and Modeling Bioinformatics
>     CHBS, WKL-135.2.26
>     Novartis Institute For Biomedical Research, Werk Klybeck
>     Klybeckstrasse 141
>     CH-4057 Basel
>     Switzerland
>     Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>     <tel:%2B41%2061%206967127>
>     Email : florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>
>
>
>
>
>
>
>
>
>     From: Michael Lawrence <lawrence.michael at gene.com
>     <mailto:lawrence.michael at gene.com>>
>     Date: Thu, 12 Apr 2012 06:54:19 -0700
>     To: NIBR <florian.hahne at novartis.com
>     <mailto:florian.hahne at novartis.com>>
>     Cc: Sean Davis <sdavis2 at mail.nih.gov <mailto:sdavis2 at mail.nih.gov>>,
>     Martin Morgan <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>,
>     "bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>"
>     <bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>>
>     Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>
>
>     You can use rtracklayer to import tabix files directly. If it's GFF or
>     BED, you can just use import(). For arbitrary tabular files, first cast
>     the path to a TabixFile, then pass it to import(). That last one is not
>     well tested. It uses the header information
>     to know the starts, ends, etc.
>
>     Michael
>
>     On Thu, Apr 12, 2012 at 6:10 AM, Hahne, Florian
>     <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>> wrote:
>
>     Sean, Martin, thanks for the suggestions. I guess a combination of
>     the two
>     would work well for me. I create my own tabix files and could certainly
>     stick the type information in the header. And I wasn't aware of
>     textConnection(), which seems to be performant enough to do what I want.
>     At least it is much better than my manual parsing...
>     One problem remains, though: the tabix files are being created from
>     within
>     R, and I don't think there is any support to add arbitrary header lines
>     available yet. Or is there?
>
>     Florian
>
>
>     Florian Hahne
>     Novartis Institute For Biomedical Research
>     Translational Sciences / Preclinical Safety / PCS Informatics
>     Expert Data Integration and Modeling Bioinformatics
>     CHBS, WKL-135.2.26
>     Novartis Institute For Biomedical Research, Werk Klybeck
>     Klybeckstrasse 141
>     CH-4057 Basel
>     Switzerland
>     Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>     <tel:%2B41%2061%206967127>
>     Email : florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>
>
>
>
>
>
>
>
>
>     On 4/12/12 2:08 PM, "Sean Davis" <sdavis2 at mail.nih.gov
>     <mailto:sdavis2 at mail.nih.gov>> wrote:
>
>     >On Thu, Apr 12, 2012 at 7:57 AM, Martin Morgan <mtmorgan at fhcrc.org
>     <mailto:mtmorgan at fhcrc.org>> wrote:
>     > > On 04/12/2012 01:19 AM, Hahne, Florian wrote:
>     > >>
>     > >> Hi all,
>     > >> I frequently get into the situation that I import data from a Tabix
>     > >>file
>     > >> using scanTabix and get a list of character vectors which I
>     first need
>     > >>to
>     > >> split back into columns using strsplit, followed by some type
>     coercion
>     > >>and
>     > >> lapply/sapply to actually get a list of data.frames which is
>     what I'd
>     > >> really want out in the first place. I may be missing something here,
>     > >>but
>     > >> wouldn't it be possible to ask scanTabix for a list of data.frames
>     > >> directly, and maybe even providing a vector of data types to coerce
>     > >>into,
>     > >> a la 'colClasses' in read.table? It just seems to me that these
>     > >>operations
>     > >> could be done much more efficiently on the C level.
>     > >
>     > >
>     > > It's definitely poorly developed but one doesn't really want to
>     > >re-invent
>     > > too much of the parsing wheel. Does
>     > >
>     > > res <- scanTabix("/foo.tbx")
>     > > read.table(textConnection(res), header=TRUE, sep="\t")
>     > >
>     > > do the trick in a reasonably performant way? Obviously less than
>     ideal,
>     > >with
>     > > the data represented as character vectors and then as data.frame. A
>     > >better
>     > > solution (colClasses ==> data.frame) wouldn't be impossible, but
>     > >guessing
>     > > column types would be a lot of redundant work.
>     >
>     >Since tabix allows arbitrary header lines, one could store metadata in
>     >the first few lines and use that to store column info and classes.
>     >One can get at the header using Rsamtools
>     >headerTabix(TabixFile('foo.tbx')). This is getting more toward
>     >developer-land than end-user, though, since the tabix file would need
>     >to be created with these uses in mind.
>     >
>     >Sean
>     >
>     >
>     > >> Thanks,
>     > >> Florian
>     > >>
>     > >>
>     > >> Florian Hahne
>     > >> Novartis Institute For Biomedical Research
>     > >> Translational Sciences / Preclinical Safety / PCS Informatics
>     > >> Expert Data Integration and Modeling Bioinformatics
>     > >> CHBS, WKL-135.2.26
>     > >> Novartis Institute For Biomedical Research, Werk Klybeck
>     > >> Klybeckstrasse 141
>     > >> CH-4057 Basel
>     > >> Switzerland
>     > >> Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>     <tel:%2B41%2061%206967127>
>     > >> Email : florian.hahne at novartis.com
>     <mailto:florian.hahne at novartis.com>
>     > >>
>     > >> _______________________________________________
>     > >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>     mailing list
>     > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>     > >
>     > >
>     > >
>     > > --
>     > > Computational Biology
>     > > Fred Hutchinson Cancer Research Center
>     > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>     > >
>     > > Location: M1-B861
>     > > Telephone: 206 667-2793 <tel:206%20667-2793> <tel:206%20667-2793>
>     > >
>     > >
>     > > _______________________________________________
>     > > Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>     mailing list
>     > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>     _______________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list