[Bioc-devel] scanTabix coercion to data.frame
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Apr 18 15:57:16 CEST 2012
On Wed, Apr 18, 2012 at 9:11 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Yea, sucks. I promise to switch to the R 2.15 series after the point
> release. Some annoying bugs in 2.15.0 right now. I'm just going to rely on
> the build servers to make sure everything works.
Yikes, really?
Do you recommend compiling from R-1-15-branch?
-steve
>
> On Tue, Apr 17, 2012 at 11:42 PM, Hahne, Florian <florian.hahne at novartis.com
>> wrote:
>
>> Ah, I see. I wasn't aware that we are supposed to develop on R-2.15,
>> although that makes perfect sense. How simple were the old days, when
>> biocDevel meant Rdevel. Sigh...
>>
>>
>>
>> Florian Hahne
>> Novartis Institute For Biomedical Research
>> Translational Sciences / Preclinical Safety / PCS Informatics
>> Expert Data Integration and Modeling Bioinformatics
>> CHBS, WKL-135.2.26
>> Novartis Institute For Biomedical Research, Werk Klybeck
>> Klybeckstrasse 141
>> CH-4057 Basel
>> Switzerland
>> Phone: +41 61 6967127
>> Email : florian.hahne at novartis.com
>>
>>
>>
>>
>>
>>
>>
>> On 4/16/12 10:17 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>>
>> >On 04/16/2012 12:31 AM, Hahne, Florian wrote:
>> >> My bad, I updated all packages before trying this and never checked what
>> >> actually happened.
>> >> The odd thing is that I am running R-devel, I have the latest
>> >> BiocInstaller 1.5.6 installed but I still only get the bioc release
>> >> packages.:
>> >> > sessionInfo()
>> >> R Under development (unstable) (2012-04-16 r59045)
>> >> Platform: x86_64-unknown-linux-gnu/x86_64 (64-bit)
>> >
>> >R switched to an annual release cycle, whereas Bioc kept it's
>> >semi-annual release. Bioc during April - October uses 'release' R for
>> >both release and devel Bioc. So BiocInstaller was expecting you to have
>> >R-2-15 regardless of whether you were 'release' or 'devel' bioc.
>> >
>> >You can manage the two versions either with duplicate copies of R-2-15
>> >installed in different locations, or using the R_LIBS_USER (for example)
>> >environment variable to point to a user library that is different for
>> >Bioc release and for Bioc devel.
>> >
>> >Martin
>> >
>> >
>> >>
>> >> locale:
>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> >> [7] LC_PAPER=C LC_NAME=C
>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats graphics grDevices utils datasets methods base
>> >>
>> >> other attached packages:
>> >> [1] rtracklayer_1.16.1 GenomicRanges_1.8.3 IRanges_1.14.2
>> >> [4] BiocGenerics_0.2.0 BiocInstaller_1.5.6
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1
>> >> [5] Rsamtools_1.9.2 stats4_2.16.0 tools_2.16.0 XML_3.9-4
>> >> [9] zlibbioc_1.2.0
>> >>
>> >> I think I followed Dan's instructions carefully, any idea why this is
>> >> not working for me?
>> >>
>> >> The little bit of debugging I tried revealed that biocInstallRepos does
>> >> not give me the right repository path:
>> >> > biocinstallRepos()
>> >> BioCsoft
>> >> "http://www.bioconductor.org/packages/2.10/bioc"
>> >> CRAN
>> >> "http://cran.fhcrc.org"
>> >> BioCann
>> >> "http://www.bioconductor.org/packages/2.10/data/annotation"
>> >> BioCexp
>> >> "http://www.bioconductor.org/packages/2.10/data/experiment"
>> >> BioCextra
>> >> "http://www.bioconductor.org/packages/2.10/extra"
>> >>
>> >> Now in there I find:
>> >> > BiocInstaller:::biocinstallRepos
>> >> function (siteRepos = character())
>> >> {
>> >> .biocinstallRepos(siteRepos = siteRepos, devel = .isDevel())
>> >> }
>> >> <environment: namespace:BiocInstaller>
>> >>
>> >> And .isDevel is defined as
>> >>
>> >> > BiocInstaller:::.isDevel
>> >> function ()
>> >> {
>> >> isOdd <- (packageVersion("BiocInstaller")$minor%%2L) == 1L
>> >> isOdd && (R.version$status == "" || R.version$status == "Patched")
>> >> }
>> >> <environment: namespace:BiocInstaller>
>> >>
>> >> I may be wrong here, but how can I ever get TRUE unless I am running R
>> >> Patched or whatever R.version$status=="" refers to? Since I am running R
>> >> devel built from svn I have
>> >> > R.version$status
>> >> [1] "Under development (unstable)"
>> >>
>> >> So I will always and for all eternity get .isDevel()==FALSEŠ
>> >>
>> >> Florian
>> >>
>> >> Florian Hahne
>> >> Novartis Institute For Biomedical Research
>> >> Translational Sciences / Preclinical Safety / PCS Informatics
>> >> Expert Data Integration and Modeling Bioinformatics
>> >> CHBS, WKL-135.2.26
>> >> Novartis Institute For Biomedical Research, Werk Klybeck
>> >> Klybeckstrasse 141
>> >> CH-4057 Basel
>> >> Switzerland
>> >> Phone: +41 61 6967127
>> >> Email : florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>
>> >>
>> >>
>> >> From: Michael Lawrence <lawrence.michael at gene.com
>> >> <mailto:lawrence.michael at gene.com>>
>> >> Date: Fri, 13 Apr 2012 09:19:20 -0700
>> >> To: NIBR <florian.hahne at novartis.com
>> >><mailto:florian.hahne at novartis.com>>
>> >> Cc: Michael Lawrence <lawrence.michael at gene.com
>> >> <mailto:lawrence.michael at gene.com>>, Sean Davis <sdavis2 at mail.nih.gov
>> >> <mailto:sdavis2 at mail.nih.gov>>, Martin Morgan <mtmorgan at fhcrc.org
>> >> <mailto:mtmorgan at fhcrc.org>>, "bioc-devel at r-project.org
>> >> <mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org
>> >> <mailto:bioc-devel at r-project.org>>
>> >> Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>> >>
>> >>
>> >>
>> >> On Fri, Apr 13, 2012 at 8:15 AM, Hahne, Florian
>> >> <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>> wrote:
>> >>
>> >> Yes, I tried this:
>> >> ff <-
>> >>
>> >>TabixFile("/CHBS/apps/itox/data/project_data_repository/1/1/project.tbx")
>> >> foo <- import(ff)
>> >> Error: evaluation nested too deeply: infinite recursion /
>> >> options(expressions=)?
>> >>
>> >> And this:
>> >>
>> >>
>> >> foo <- import(ff, which=GRanges(seqnames="chrX",
>> >>ranges=IRanges(start=1,
>> >> end=1e8)))
>> >> Error: evaluation nested too deeply: infinite recursion /
>> >> options(expressions=)?
>> >>
>> >> And then I gave up :-)
>> >>
>> >>
>> >>
>> >> Ok, well I said the devel version, i.e., 1.17.1, not 1.16.1.
>> >>
>> >> > sessionInfo()
>> >> R Under development (unstable) (2012-04-03 r58904)
>> >> Platform: x86_64-unknown-linux-gnu/x86_64 (64-bit)
>> >>
>> >> locale:
>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> >> [7] LC_PAPER=C LC_NAME=C
>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats graphics grDevices utils datasets methods base
>> >>
>> >> other attached packages:
>> >> [1] rtracklayer_1.16.1 Rsamtools_1.9.2 Biostrings_2.24.1
>> >> [4] GenomicRanges_1.8.3 IRanges_1.14.2 BiocGenerics_0.2.0
>> >> [7] BiocInstaller_1.4.3
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 stats4_2.16.0
>> >> [5] tools_2.16.0 XML_3.9-4 zlibbioc_1.2.0
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Florian Hahne
>> >> Novartis Institute For Biomedical Research
>> >> Translational Sciences / Preclinical Safety / PCS Informatics
>> >> Expert Data Integration and Modeling Bioinformatics
>> >> CHBS, WKL-135.2.26
>> >> Novartis Institute For Biomedical Research, Werk Klybeck
>> >> Klybeckstrasse 141
>> >> CH-4057 Basel
>> >> Switzerland
>> >> Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>> >> Email : florian.hahne at novartis.com
>> >><mailto:florian.hahne at novartis.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Michael Lawrence <lawrence.michael at gene.com
>> >> <mailto:lawrence.michael at gene.com>>
>> >> Date: Thu, 12 Apr 2012 10:07:31 -0700
>> >> To: NIBR <florian.hahne at novartis.com
>> >> <mailto:florian.hahne at novartis.com>>
>> >> Cc: Michael Lawrence <lawrence.michael at gene.com
>> >> <mailto:lawrence.michael at gene.com>>, Sean Davis
>> >> <sdavis2 at mail.nih.gov <mailto:sdavis2 at mail.nih.gov>>, Martin Morgan
>> >> <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>,
>> >> "bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>"
>> >> <bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>>
>> >> Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>> >>
>> >>
>> >> Did you try the latest devel version?
>> >>
>> >> On Thu, Apr 12, 2012 at 9:29 AM, Hahne, Florian
>> >> <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>>
>> >>wrote:
>> >>
>> >> Thanks, I gave it a shot and got this:
>> >> Error: evaluation nested too deeply: infinite recursion /
>> >> options(expressions=)?
>> >>
>> >>
>> >> Guess I'll stick with scanTabix for now :-)
>> >> Florian
>> >> Florian Hahne
>> >> Novartis Institute For Biomedical Research
>> >> Translational Sciences / Preclinical Safety / PCS Informatics
>> >> Expert Data Integration and Modeling Bioinformatics
>> >> CHBS, WKL-135.2.26
>> >> Novartis Institute For Biomedical Research, Werk Klybeck
>> >> Klybeckstrasse 141
>> >> CH-4057 Basel
>> >> Switzerland
>> >> Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>> >> <tel:%2B41%2061%206967127>
>> >> Email : florian.hahne at novartis.com
>> >><mailto:florian.hahne at novartis.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Michael Lawrence <lawrence.michael at gene.com
>> >> <mailto:lawrence.michael at gene.com>>
>> >> Date: Thu, 12 Apr 2012 06:54:19 -0700
>> >> To: NIBR <florian.hahne at novartis.com
>> >> <mailto:florian.hahne at novartis.com>>
>> >> Cc: Sean Davis <sdavis2 at mail.nih.gov <mailto:sdavis2 at mail.nih.gov
>> >>,
>> >> Martin Morgan <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>,
>> >> "bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>"
>> >> <bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>>
>> >> Subject: Re: [Bioc-devel] scanTabix coercion to data.frame
>> >>
>> >>
>> >> You can use rtracklayer to import tabix files directly. If it's GFF
>> >>or
>> >> BED, you can just use import(). For arbitrary tabular files, first
>> >>cast
>> >> the path to a TabixFile, then pass it to import(). That last one is
>> >>not
>> >> well tested. It uses the header information
>> >> to know the starts, ends, etc.
>> >>
>> >> Michael
>> >>
>> >> On Thu, Apr 12, 2012 at 6:10 AM, Hahne, Florian
>> >> <florian.hahne at novartis.com <mailto:florian.hahne at novartis.com>>
>> >>wrote:
>> >>
>> >> Sean, Martin, thanks for the suggestions. I guess a combination of
>> >> the two
>> >> would work well for me. I create my own tabix files and could
>> >>certainly
>> >> stick the type information in the header. And I wasn't aware of
>> >> textConnection(), which seems to be performant enough to do what I
>> >>want.
>> >> At least it is much better than my manual parsing...
>> >> One problem remains, though: the tabix files are being created from
>> >> within
>> >> R, and I don't think there is any support to add arbitrary header
>> >>lines
>> >> available yet. Or is there?
>> >>
>> >> Florian
>> >>
>> >>
>> >> Florian Hahne
>> >> Novartis Institute For Biomedical Research
>> >> Translational Sciences / Preclinical Safety / PCS Informatics
>> >> Expert Data Integration and Modeling Bioinformatics
>> >> CHBS, WKL-135.2.26
>> >> Novartis Institute For Biomedical Research, Werk Klybeck
>> >> Klybeckstrasse 141
>> >> CH-4057 Basel
>> >> Switzerland
>> >> Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>> >> <tel:%2B41%2061%206967127>
>> >> Email : florian.hahne at novartis.com
>> >><mailto:florian.hahne at novartis.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 4/12/12 2:08 PM, "Sean Davis" <sdavis2 at mail.nih.gov
>> >> <mailto:sdavis2 at mail.nih.gov>> wrote:
>> >>
>> >> >On Thu, Apr 12, 2012 at 7:57 AM, Martin Morgan <mtmorgan at fhcrc.org
>> >> <mailto:mtmorgan at fhcrc.org>> wrote:
>> >> > > On 04/12/2012 01:19 AM, Hahne, Florian wrote:
>> >> > >>
>> >> > >> Hi all,
>> >> > >> I frequently get into the situation that I import data from a
>> >>Tabix
>> >> > >>file
>> >> > >> using scanTabix and get a list of character vectors which I
>> >> first need
>> >> > >>to
>> >> > >> split back into columns using strsplit, followed by some type
>> >> coercion
>> >> > >>and
>> >> > >> lapply/sapply to actually get a list of data.frames which is
>> >> what I'd
>> >> > >> really want out in the first place. I may be missing something
>> >>here,
>> >> > >>but
>> >> > >> wouldn't it be possible to ask scanTabix for a list of
>> >>data.frames
>> >> > >> directly, and maybe even providing a vector of data types to
>> >>coerce
>> >> > >>into,
>> >> > >> a la 'colClasses' in read.table? It just seems to me that these
>> >> > >>operations
>> >> > >> could be done much more efficiently on the C level.
>> >> > >
>> >> > >
>> >> > > It's definitely poorly developed but one doesn't really want to
>> >> > >re-invent
>> >> > > too much of the parsing wheel. Does
>> >> > >
>> >> > > res <- scanTabix("/foo.tbx")
>> >> > > read.table(textConnection(res), header=TRUE, sep="\t")
>> >> > >
>> >> > > do the trick in a reasonably performant way? Obviously less than
>> >> ideal,
>> >> > >with
>> >> > > the data represented as character vectors and then as
>> >>data.frame. A
>> >> > >better
>> >> > > solution (colClasses ==> data.frame) wouldn't be impossible, but
>> >> > >guessing
>> >> > > column types would be a lot of redundant work.
>> >> >
>> >> >Since tabix allows arbitrary header lines, one could store
>> >>metadata in
>> >> >the first few lines and use that to store column info and classes.
>> >> >One can get at the header using Rsamtools
>> >> >headerTabix(TabixFile('foo.tbx')). This is getting more toward
>> >> >developer-land than end-user, though, since the tabix file would
>> >>need
>> >> >to be created with these uses in mind.
>> >> >
>> >> >Sean
>> >> >
>> >> >
>> >> > >> Thanks,
>> >> > >> Florian
>> >> > >>
>> >> > >>
>> >> > >> Florian Hahne
>> >> > >> Novartis Institute For Biomedical Research
>> >> > >> Translational Sciences / Preclinical Safety / PCS Informatics
>> >> > >> Expert Data Integration and Modeling Bioinformatics
>> >> > >> CHBS, WKL-135.2.26
>> >> > >> Novartis Institute For Biomedical Research, Werk Klybeck
>> >> > >> Klybeckstrasse 141
>> >> > >> CH-4057 Basel
>> >> > >> Switzerland
>> >> > >> Phone: +41 61 6967127 <tel:%2B41%2061%206967127>
>> >> <tel:%2B41%2061%206967127>
>> >> > >> Email : florian.hahne at novartis.com
>> >> <mailto:florian.hahne at novartis.com>
>> >> > >>
>> >> > >> _______________________________________________
>> >> > >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> >> mailing list
>> >> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Computational Biology
>> >> > > Fred Hutchinson Cancer Research Center
>> >> > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>> >> > >
>> >> > > Location: M1-B861
>> >> > > Telephone: 206 667-2793 <tel:206%20667-2793>
>> >><tel:206%20667-2793>
>> >> > >
>> >> > >
>> >> > > _______________________________________________
>> >> > > Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> >> mailing list
>> >> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >> _______________________________________________
>> >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> >>list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >--
>> >Computational Biology
>> >Fred Hutchinson Cancer Research Center
>> >1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>> >
>> >Location: M1-B861
>> >Telephone: 206 667-2793
>>
>>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioc-devel
mailing list