[R-sig-hpc] Handling data with thousands of variables
Brian G. Peterson
brian at braverock.com
Sun Jun 26 16:06:20 CEST 2011
On Sun, 2011-06-26 at 14:09 +0200, Håvard Wahl Kongsgård wrote:
> Again sorry about the bad example. The tuples are not the same length,
> some have 20 object others 150...
Some facts about your job:
~ 10 000 000 records
~ 20 000 keywords
- each record consists of a combination of
+ response variable and
+ structured string-based tuple of ~20-150 keywords
So, to ask more questions and avoid more assumptions:
- are the response variables numeric? (integer or floating point?)
- does the order of the tuples matter ?
- do you know all the possible keywords ?
(so that they could be encoded with numerical representations)
Regards,
- Brian
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
More information about the R-sig-hpc
mailing list