[R-SIG-Finance] How to input large datasets into R

Whit Armstrong armstrong.whit at gmail.com
Tue Jun 29 14:12:23 CEST 2010


warmstrong at research:~$ R
> stocks <- list()
> for(i in 1:30) { stocks[[i]] <- matrix(0.0,nrow=610000,ncol=7) }
> gc()
            used  (Mb) gc trigger   (Mb)  max used  (Mb)
Ncells    114397   6.2     350000   18.7    350000  18.7
Vcells 128222013 978.3  136994473 1045.2 128222477 978.3
>

1GB for the raw data doesn't seem so bad.

If you can't find a server somewhere that has a decent amount of ram,
then you have a couple of choices.

1) aggregate the data to 10min bars, or 30min bars to get started
2) use only half or a quarter of your data (which you have tried already)
3) work with only one stock in memory at a time (if you are pooling
data, this obviously wont' work)
4) use less memory hungry methods (look at Armadillo for instance:
http://dirk.eddelbuettel.com/code/rcpp.armadillo.html)
5) also check out these packages: bigmemory
(http://cran.r-project.org/web/packages/bigmemory/index.html) and
biglm  (http://cran.r-project.org/package=biglm)

This list is a great resource. Keep posting here as you progress.

-Whit


On Tue, Jun 29, 2010 at 1:51 AM, Aaditya Nanduri
<aaditya.nanduri at gmail.com> wrote:
> Hello All.
>
> For my HW assignment, I was given 30 stocks with minute data (date,
> time, open, close, high, low, vol) over 7 years.
>
> So, each stock has about 610000 rows of data which makes it impossible to
> calculate z-scores for mean-reversion strategies (required for HW) for even
> one stock.
>
> Is there any way R can read only certain lines of data?
>
> For example, in the OU process we use increments of 60. So can R read 1:60,
> then 2:61 and so on?
>
> I recently tried a simple regression on half the data (training set) on my
> school's computer only to watch it eat up the entire memory leaving me no
> option but to restart the computer.
>
> The data is in .csv format if it matters.
>
> Im an undergrad learning about the basic methods in stat arb in an informal
> setting so you may assume I have absolutely no clue about pretty much
> anything and everything.
>
> And are there any tutorials online for using quantmod? That would be very
> helpful.
>
> Thank you very much.
>
> Sincerely,
> Aaditya Nanduri
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>



More information about the R-SIG-Finance mailing list