[R] memory management
bogdan romocea
br44114 at gmail.com
Mon Oct 30 18:00:09 CET 2006
This was asked before. Collapse the data frame into a vector, e.g.
v <- apply(DF,1,function(x) {paste(x,collapse="_")})
then work with the values of that vector (table, unique etc). If your
data frame is really large run this in a DBMS.
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> Federico Calboli
> Sent: Monday, October 30, 2006 11:35 AM
> To: r-help
> Subject: [R] memory management
>
> Hi All,
>
> just a quick (?) question while I wait my code runs...
>
> I'm comparing the identity of the lines of a dataframe, doing
> all possible
> pairwise comparisons. In doing so I use identical(), but
> that's by the way. I'm
> doing a (not so) quick and dirty check, and subsetting the data as
>
> data[row.numb,]
>
> and
>
> data[a different row,]
>
> I suspect the problem there is that I load into memory the
> whole frame data[,]
> every time, making the biz quite slow and wasteful. As I'm
> idly waiting, I
> though, had I put every line of data[,] as the item of a
> list, then done my
> pairwise comparisons using the list, would I have had a
> better performance?
>
> (do I win the prize for the most convoluted sentence sent to
> the R-help?)
>
> For the pedants, yes, I know I could kill the process and try
> myself, but the
> spirit of the question is, is there a way of dealing with big
> data *efficiently*?
>
> Best,
>
> Fede
>
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list