[R] Another big data size problem

Federico Gherardini f.gherardini at pigrecodata.net
Wed Jul 28 17:26:10 CEST 2004


On Wed, 28 Jul 2004 13:28:20 +0100
Ernesto Jardim <ernesto at ipimar.pt> wrote:


> Hi,
> 
> When you're writing a table to MySQL you have to be carefull if the
> table is created by RMySQL. The fields definition may not be the most
> adequate and there will be no indexes in your table, which makes the
> queries _very_ slow.
> 
So, if I understood correctly, if you want to use SQL you'll have to upload the table in SQL, directly from MySQL without using R at all, and then use RMySQL to read the elements in R?

Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote:

>Note that it is better to initialize the object to full size before 
>inserting -- rather than using rbind() and friends which is indeed slow
>since it need to re-allocate much memory for each step.

Do you mean something like this?

tab <- matrix(rep(0, 1227 * 20000), 1227, 20000, byrow = TRUE)

for(i in 0:num.lines)
	tab[i + 1,] <- scan("mytab", nlines = 1, what="PS", skip = i)
	

The above doesn't get very far either... it seems that, once it has created the table, it becomes so slow that it's unusable. I'll have to try this with more RAM by the way.

Cheers,

fede




More information about the R-help mailing list