[R] Slow reading multiple tick data files into list of dataframes
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Oct 11 23:48:35 CEST 2010
On Mon, Oct 11, 2010 at 5:39 PM, rivercode <aquanyc at gmail.com> wrote:
>
> Hi,
>
> I am trying to find the best way to read 85 tick data files of format:
>
>> head(nbbo)
> 1 bid CON 09:30:00.722 09:30:00.722 32.71 98
> 2 ask CON 09:30:00.782 09:30:00.810 33.14 300
> 3 ask CON 09:30:00.809 09:30:00.810 33.14 414
> 4 bid CON 09:30:00.783 09:30:00.810 33.06 200
>
> Each file has between 100,000 to 300,300 rows.
>
> Currently doing nbbo.list<- lapply(filePath, read.csv) to create list
> with 85 data.frame objects...but it is taking minutes to read in the data
> and afterwards I get the following message on the console when taking
> further actions (though it does then stop):
>
> The R Engine is busy. Please wait, and try your command again later.
>
> filePath in the above example is a vector of filenames:
>> head(filePath)
> [1] "C:/work/A/A_2010-10-07_nbbo.csv"
> [2] "C:/work/AAPL/AAPL_2010-10-07_nbbo.csv"
> [3] "C:/work/ADBE/ADBE_2010-10-07_nbbo.csv"
> [4] "C:/work/ADI/ADI_2010-10-07_nbbo.csv"
>
> Is there a better/quicker or more R way of doing this ?
>
You could try (possibly with suitable additonal arguments):
library(sqldf)
lapply(filePath, read.csv.sql)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list