[Bioc-devel] scanTabix coercion to data.frame
Martin Morgan
mtmorgan at fhcrc.org
Thu Apr 12 13:57:12 CEST 2012
On 04/12/2012 01:19 AM, Hahne, Florian wrote:
> Hi all,
> I frequently get into the situation that I import data from a Tabix file
> using scanTabix and get a list of character vectors which I first need to
> split back into columns using strsplit, followed by some type coercion and
> lapply/sapply to actually get a list of data.frames which is what I'd
> really want out in the first place. I may be missing something here, but
> wouldn't it be possible to ask scanTabix for a list of data.frames
> directly, and maybe even providing a vector of data types to coerce into,
> a la 'colClasses' in read.table? It just seems to me that these operations
> could be done much more efficiently on the C level.
It's definitely poorly developed but one doesn't really want to
re-invent too much of the parsing wheel. Does
res <- scanTabix("/foo.tbx")
read.table(textConnection(res), header=TRUE, sep="\t")
do the trick in a reasonably performant way? Obviously less than ideal,
with the data represented as character vectors and then as data.frame. A
better solution (colClasses ==> data.frame) wouldn't be impossible, but
guessing column types would be a lot of redundant work.
Martin
> Thanks,
> Florian
>
>
> Florian Hahne
> Novartis Institute For Biomedical Research
> Translational Sciences / Preclinical Safety / PCS Informatics
> Expert Data Integration and Modeling Bioinformatics
> CHBS, WKL-135.2.26
> Novartis Institute For Biomedical Research, Werk Klybeck
> Klybeckstrasse 141
> CH-4057 Basel
> Switzerland
> Phone: +41 61 6967127
> Email : florian.hahne at novartis.com
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-devel
mailing list