[R] How to read a file containing two types of rows - (for the Netflix challenge data format)
Emmanuel Levy
emm@nue|@|evy @end|ng |rom gm@||@com
Fri Jan 31 10:04:19 CET 2020
Hi,
I'd like to use the Netflix challenge data and just can't figure out how to
efficiently "scan" the files.
https://www.kaggle.com/netflix-inc/netflix-prize-data
The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
3 values associated to each ID:
The format is as follows:
*1:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
value1,value2, value3
*2:*
value1,value2, value3
value1,value2, value3
*3:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
*4:*
etc ...
And I want to create a matrix where each line is of the form:
ID value1, value2, value3
Si "ID" needs to be duplicated - I could write a Perl script to convert
this format to CSV, but I'm sure there's a simple R trick.
Thanks for suggestions!
Emmanuel
[[alternative HTML version deleted]]
More information about the R-help
mailing list