[R-sig-hpc] Handling data with thousands of variables

Sean Davis sdavis2 at mail.nih.gov
Sun Jun 26 17:21:08 CEST 2011


It might be really useful for you to describe the analytical question
or problem, not just the data.  Dealing with millions of data points
does not make the problem special, really; R is quite happy with very
large datasets and with little work can even be used with datasets
that are much larger than available memory.  What is it that you are
trying to show for your data?  Do you have an example of another
analysis done by someone else that you would like to reproduce using
your data?

Sean

2011/6/26 Håvard Wahl Kongsgård <haavard.kongsgaard at gmail.com>:
>> - are the response variables numeric? (integer or floating point?)
> integer
>
>> - does the order of the tuples matter ?
> no,
>
>> - do you know all the possible keywords ?
>>  (so that they could be encoded with numerical representations)
> nope..
>
> The database relates to file sharing activity, the keywords are plot
> keywords. Like http://www.imdb.com/title/tt1133985/keywords
>
> -Håvard
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list