[R] sqldf for Very Large Tab Delimited Files

HC hcatbr at yahoo.co.in
Thu Feb 2 05:57:02 CET 2012


Hi All,

I have a very (very) large tab-delimited text file without headers. There
are only 8 columns and millions of rows. I want to make numerous pieces of
this file by sub-setting it for individual stations. Station is given as in
the first column. I am trying to learn and use sqldf package for this but am
stuck in a couple of places.

To simulate my requirement, I have taken iris dataset as an example and have
done the following:
(1) create a tab-delimited file without headers.
(2) read it using read.csv.sql command
(3) write the result of a query, getting first 10 records

Here is the reproducible code that I am trying:
# Text data file
write.table(iris, "irisNoH.txt", sep = "\t", quote = FALSE,
col.names=FALSE,row.names = FALSE)
# create an empty database (can skip this step if database already exists)
sqldf("attach myTestdbT as new")
f1<-file("irisNoH.txt")
attr(f1, "file.format") <- list(header=FALSE,sep="\t")
# read into table called irisTab in the mytestdb sqlite database
read.csv.sql("irisNoH.txt", sql = "create table main.irisTab1 as select *
from file", dbname = "mytestdb")
res1<-sqldf("select * from main.irisTab1 limit 10", dbname = "mytestdb")
write.table(res1, "iris10.txt", sep = "\t", quote = FALSE,
col.names=FALSE,row.names = FALSE)

# For querying records of a particular species - unresolved problems
#a1<-"virginica"
#attr(f1, "names") <- c("A1","A2","A3","A4","A5")
#res2<-fn$sqldf("select * from main.irisTab1 where A5 = '$a1'")

In the above, I am not able to:
(1) assign the names to various columns
(2) query for particular value of a column; in this case for particular
species, say virginica
(3) I guess fn$sqldf can do the job but it requires assigning column names

Any help would be most appreciated.

Thanks
HC




--
View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4350555.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list