[BioC] GEOquery and parsing SOFT files
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Mon May 25 16:13:21 CEST 2009
Hello,
The getGEO function from GEOquery parses GEO soft files. With a
particular GSE file (GSE13638), it took over 15 minutes on my
not-so-crappy machine to parse the file (a local file, download time
excluded). I've written a simple parser in perl, and parsing the same
file and storing the data in a nested hash/array structure takes ca. 2
seconds. I'm pretty sure there is more essential processing done by
getGEO to organize the data into a GSE object, but still, there seems to
be an incredibly inefficient implementation underneath.
I haven't looked at the source code yet, but here's a question: what is
the likely reason getGEO is so slow? Is it the parsing itself, or
rather wraping the data into the appropriate structure? Where should I
start to look for code to be improved?
vQ
More information about the Bioconductor
mailing list