[Bioc-devel] scanTabix coercion to data.frame

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 12 13:57:12 CEST 2012


On 04/12/2012 01:19 AM, Hahne, Florian wrote:
> Hi all,
> I frequently get into the situation that I import data from a Tabix file
> using scanTabix and get a list of character vectors which I first need to
> split back into columns using strsplit, followed by some type coercion and
> lapply/sapply to actually get a list of data.frames which is what I'd
> really want out in the first place. I may be missing something here, but
> wouldn't it be possible to ask scanTabix for a list of data.frames
> directly, and maybe even providing a vector of data types to coerce into,
> a la 'colClasses' in read.table? It just seems to me that these operations
> could be done much more efficiently on the C level.

It's definitely poorly developed but one doesn't really want to 
re-invent too much of the parsing wheel. Does

   res <- scanTabix("/foo.tbx")
   read.table(textConnection(res), header=TRUE, sep="\t")

do the trick in a reasonably performant way? Obviously less than ideal, 
with the data represented as character vectors and then as data.frame. A 
better solution (colClasses ==> data.frame) wouldn't be impossible, but 
guessing column types would be a lot of redundant work.

Martin

> Thanks,
> Florian
>
>
> Florian Hahne
> Novartis Institute For Biomedical Research
> Translational Sciences / Preclinical Safety / PCS Informatics
> Expert Data Integration and Modeling Bioinformatics
> CHBS, WKL-135.2.26
> Novartis Institute For Biomedical Research, Werk Klybeck
> Klybeckstrasse 141
> CH-4057 Basel
> Switzerland
> Phone: +41 61 6967127
> Email : florian.hahne at novartis.com
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list