[R] large dataset

kMan kchamberln at gmail.com
Mon Mar 29 06:07:10 CEST 2010


>This was *very* useful for me when I dealt with a 1.5Gb text file
>http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_la
rge_data/

Two hours is a *very* long time to transfer a csv file to a db. The author
of the linked article has not documented how to use scan() arguments
appropriately for the task. I take particular issue with the authors
statement that "R is said to be slow, memory hungry and only capable of
handling small datasets," indicating he/she has crummy informants and not
challenged the notion him/herself. 

n.vialma, 100,000 records is likely not a lot of data. If it is taking more
than two or three minutes, something is wrong. Knowing the record limits in
R is a good starting point, but will only get you part of the way. How many
records does your file contain? Do you know how to find out? What are the
data types of the records? What is the call you are using to import the
records into R? What OS are you using? How much RAM does your system have?
What is the size of the R-environment on your system? Do you have resource
intensive applications running (such as MS-Office)?

A lot of folks on this list have been through what you are now dealing with,
so there is plenty of help. I find myself smiling inside & wanting to say
"welcome!"

Sincerely,
KeithC.

-----Original Message-----
From: Khanh Nguyen [mailto:nguyen.h.khanh at gmail.com] 
Sent: Saturday, March 27, 2010 8:59 AM
To: n.vialma at libero.it
Cc: r-help
Subject: Re: [R] large dataset

This was *very* useful for me when I dealt with a 1.5Gb text file

http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_lar
ge_data/


On Sat, Mar 27, 2010 at 5:19 AM, n.vialma at libero.it <n.vialma at libero.it>
wrote:
> Hi I have a question,
> as im not able to import a csv file which contains a big dataset(100.000
records) someone knows how many records R can handle without giving
problems?
> What im facing when i try to import the file is that R generates more than
100.000 records and is very slow...
> thanks a lot!!!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list