[R] How to read a file containing two types of rows - (for the Netflix challenge data format)

Emmanuel Levy emm@nue|@|evy @end|ng |rom gm@||@com
Fri Jan 31 10:04:19 CET 2020


Hi,

I'd like to use the Netflix challenge data and just can't figure out how to
efficiently "scan" the files.
https://www.kaggle.com/netflix-inc/netflix-prize-data

The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
3 values associated to each ID:

The format is as follows:
*1:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
value1,value2, value3
*2:*
value1,value2, value3
value1,value2, value3
*3:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
*4:*
etc ...

And I want to create a matrix where each line is of the form:

ID value1, value2, value3

Si "ID" needs to be duplicated - I could write a Perl script to convert
this format to CSV, but I'm sure there's a simple R trick.

Thanks for suggestions!

Emmanuel

	[[alternative HTML version deleted]]



More information about the R-help mailing list