[R-SIG-Finance] How to input large datasets into R

Whit Armstrong armstrong.whit at gmail.com
Tue Jun 29 14:12:23 CEST 2010

warmstrong at research:~$ R
> stocks <- list()
> for(i in 1:30) { stocks[[i]] <- matrix(0.0,nrow=610000,ncol=7) }
> gc()
            used  (Mb) gc trigger   (Mb)  max used  (Mb)
Ncells    114397   6.2     350000   18.7    350000  18.7
Vcells 128222013 978.3  136994473 1045.2 128222477 978.3

1GB for the raw data doesn't seem so bad.

If you can't find a server somewhere that has a decent amount of ram,
then you have a couple of choices.

1) aggregate the data to 10min bars, or 30min bars to get started
2) use only half or a quarter of your data (which you have tried already)
3) work with only one stock in memory at a time (if you are pooling
data, this obviously wont' work)
4) use less memory hungry methods (look at Armadillo for instance:
5) also check out these packages: bigmemory
(http://cran.r-project.org/web/packages/bigmemory/index.html) and
biglm  (http://cran.r-project.org/package=biglm)

This list is a great resource. Keep posting here as you progress.


On Tue, Jun 29, 2010 at 1:51 AM, Aaditya Nanduri
<aaditya.nanduri at gmail.com> wrote:
> Hello All.
> For my HW assignment, I was given 30 stocks with minute data (date,
> time, open, close, high, low, vol) over 7 years.
> So, each stock has about 610000 rows of data which makes it impossible to
> calculate z-scores for mean-reversion strategies (required for HW) for even
> one stock.
> Is there any way R can read only certain lines of data?
> For example, in the OU process we use increments of 60. So can R read 1:60,
> then 2:61 and so on?
> I recently tried a simple regression on half the data (training set) on my
> school's computer only to watch it eat up the entire memory leaving me no
> option but to restart the computer.
> The data is in .csv format if it matters.
> Im an undergrad learning about the basic methods in stat arb in an informal
> setting so you may assume I have absolutely no clue about pretty much
> anything and everything.
> And are there any tutorials online for using quantmod? That would be very
> helpful.
> Thank you very much.
> Sincerely,
> Aaditya Nanduri
>        [[alternative HTML version deleted]]
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.

More information about the R-SIG-Finance mailing list