[R] R, PostgresSQL and poor performance
James Cloos
cloos at jhcloos.com
Wed Dec 14 00:24:43 CET 2011
>>>>> "BD" == Berry, David <dyb at noc.ac.uk> writes:
BD> All variables are reals other than id which is varchar(10) and date
BD> which is a timestamp, approximately 1.5 million rows are returned by
BD> the query and it takes order 10 second to execute using psql (the
BD> command line client for Postgres) and a similar time using pgAdmin
BD> 3. In R it takes several minutes to run and I'm unsure where the
BD> bottleneck is occurring.
You may want to test progressively smaller chunks of the data to see how
quickly R slows down as compared to psql on that query.
My first guess is that something allocating and re-allocating ram in a
quadratic (or worse) fashion.
I don't know whether OSX has anything equivilent, but you could test on
the linux box using oprofile (http://oprofile.sourceforge.net; SuSE
should have an rpm for it and kernel support compiled in) to confirm
where the time is spent.
It is /possible/ that the (sql)NULL->(r)NA logic in RS-PostgreSQL.c may
be slow (relatively speaking), but it is necessary. Nothing else jumps
out as a possible choke point.
Oprofile (or the equivilent) would best answer the question.
-JimC
--
James Cloos <cloos at jhcloos.com> OpenPGP: 1024D/ED7DAEA6
More information about the R-help
mailing list