[R-SIG-Finance] How to input large datasets into R

Tue Jun 29 12:15:27 CEST 2010

On Tue, Jun 29, 2010 at 1:51 AM, Aaditya Nanduri
<aaditya.nanduri at gmail.com> wrote:
> Hello All.
>
> For my HW assignment, I was given 30 stocks with minute data (date,
> time, open, close, high, low, vol) over 7 years.
>
> So, each stock has about 610000 rows of data which makes it impossible to
> calculate z-scores for mean-reversion strategies (required for HW) for even
> one stock.
>
> Is there any way R can read only certain lines of data?
>

1. You can use read.csv(..., nrows = N) to only read the first N lines
of data or you can use colClasses= argument to exclude certain
columns.  See help(read.table)

2. You can use read.csv.sql in the sqldf package to read a random set
of data.  It automatically creates a database and reads the data into
the database without going through R and then from the database it
reads a random set of rows and then deletes the database.  Here is an
example.  Note that aside from the code to create the sample data file
its only one line of code.  Also see examples 6e and 6f on the sqldf
home page:  http://sqldf.googlecode.com

# create sample data
library(sqldf)
write.table(iris, "iris150.dat", sep = ",", quote = FALSE)

# read in a random set of 4 rows
DF <- read.csv.sql("iris150.dat", sql = "select * from file order by
random(*) limit 4")