[R-sig-DB] Working with Large sets of Data or even Big Data

Sean O'Riordain @e@npor @end|ng |rom @cm@org
Thu Mar 7 19:31:19 CET 2013


Very nice Eugene,

When I have large tables (say 30 million rows), I am often not interested
in most of the columns, so I set colClasses=c('character', rep(NULL, 30),
'integer', 'integer', 'character') etc... in the read.table()... this
considerably speeds up the ingesting of the table and specifying the class
you want means that you don't get an auto-convert to factor when it is
inappropriate, e.g. addresses.

Kind regards,
Se�n



On 6 March 2013 20:14, CIURANA EUGENE (R users list) <r.user using ciurana.eu>wrote:

>
>
> On 2013-03-06 12:07, CIURANA EUGENE (R users list) wrote:
>
> > On
> 2013-03-06 11:24, Paul Bernal wrote:
> >
> >> I managed to connect R with
> some database tables residing in Microsoft SQL Server, and I also got R
> to read the information in. My question is, since usually, those tables
> can have 50 thousand or even more records, or even hundreds of thousands
> or millions of records what would be the maximum amount of records that
> R is able to read and process or better put, what would be the maximum
> amount of rows and columns that R is able to manage (to perform data
> mining analysis)? What would I need to do in order for R to be able to
> manage large amounts of data?
>
> Sorry - I forgot to add: the space
> example at the end of my previous message: ~2.5 million records.
>
>
> Cheers!
>
> pr3d
>
> --
> http://summly.com | http://eugeneciurana.com
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>

	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list