[R] Sampling from a Postgres database

Joe Conway mail at joeconway.com
Fri Jan 15 18:07:13 CET 2010


On 01/15/2010 01:49 AM, Bart Joosen wrote:
> 
> One way could be to first select only the unique ID's, sample this and then
> select only the relevant records:
> 
> strQuery = "SELECT ID from tblFoo;"
> IDs <- sqlQuery(channel, strQuery)
> sample.IDs <- sample(IDs,10)
> strQuery = paste("SELECT ID from tblFoo WHRE ID IN(", sample.IDs, ");")
> IDs <- sqlQuery(channel, strQuery)

Better is to use the built-in random() function in Postgres:

#select count(*) from visits;
  count
---------
 4846604
(1 row)

# select count(*) from visits where random() < 0.005;
 count
-------
 24391
(1 row)

HTH,

Joe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 899 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100115/10f492cc/attachment.bin>


More information about the R-help mailing list