[R-sig-hpc] Handling data with thousands of variables
Sean Davis
sdavis2 at mail.nih.gov
Sun Jun 26 17:21:08 CEST 2011
It might be really useful for you to describe the analytical question
or problem, not just the data. Dealing with millions of data points
does not make the problem special, really; R is quite happy with very
large datasets and with little work can even be used with datasets
that are much larger than available memory. What is it that you are
trying to show for your data? Do you have an example of another
analysis done by someone else that you would like to reproduce using
your data?
Sean
2011/6/26 Håvard Wahl Kongsgård <haavard.kongsgaard at gmail.com>:
>> - are the response variables numeric? (integer or floating point?)
> integer
>
>> - does the order of the tuples matter ?
> no,
>
>> - do you know all the possible keywords ?
>> (so that they could be encoded with numerical representations)
> nope..
>
> The database relates to file sharing activity, the keywords are plot
> keywords. Like http://www.imdb.com/title/tt1133985/keywords
>
> -Håvard
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
More information about the R-sig-hpc
mailing list