[R] How to read a file containing two types of rows - (for the Netflix challenge data format)

Rainer M Krug R@|ner @end|ng |rom krug@@de
Fri Jan 31 10:55:46 CET 2020


I did something similar yesterday…

Use readLine() to read at in and identify the “*1:*, … with a regex. Than you have your dividers. In a second step, use read.csv(skip = …, Ncollumns = …)  to read the enclosed blocks, and last, combine them accordingly.

This is written without an R installation, so the argument names are likely wrong.

Rainer


> On 31 Jan 2020, at 10:04, Emmanuel Levy <emmanuel.levy using gmail.com> wrote:
> 
> Hi,
> 
> I'd like to use the Netflix challenge data and just can't figure out how to
> efficiently "scan" the files.
> https://www.kaggle.com/netflix-inc/netflix-prize-data
> 
> The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
> 3 values associated to each ID:
> 
> The format is as follows:
> *1:*
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> *2:*
> value1,value2, value3
> value1,value2, value3
> *3:*
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> *4:*
> etc ...
> 
> And I want to create a matrix where each line is of the form:
> 
> ID value1, value2, value3
> 
> Si "ID" needs to be duplicated - I could write a Perl script to convert
> this format to CSV, but I'm sure there's a simple R trick.
> 
> Thanks for suggestions!
> 
> Emmanuel
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Orcid ID: 0000-0002-7490-0066

Department of Evolutionary Biology and Environmental Studies
University of Zürich
Office Y34-J-74
Winterthurerstrasse 190
8075 Zürich
Switzerland

Office:	+41 (0)44 635 47 64
Cell:       	+41 (0)78 630 66 57
email:      Rainer.Krug using uzh.ch
		Rainer using krugs.de
Skype:     RMkrug

PGP: 0x0F52F982




	[[alternative HTML version deleted]]



More information about the R-help mailing list