[R] How to read a file containing two types of rows - (for the Netflix challenge data format)
Rainer M Krug
R@|ner @end|ng |rom krug@@de
Fri Jan 31 10:55:46 CET 2020
I did something similar yesterday…
Use readLine() to read at in and identify the “*1:*, … with a regex. Than you have your dividers. In a second step, use read.csv(skip = …, Ncollumns = …) to read the enclosed blocks, and last, combine them accordingly.
This is written without an R installation, so the argument names are likely wrong.
Rainer
> On 31 Jan 2020, at 10:04, Emmanuel Levy <emmanuel.levy using gmail.com> wrote:
>
> Hi,
>
> I'd like to use the Netflix challenge data and just can't figure out how to
> efficiently "scan" the files.
> https://www.kaggle.com/netflix-inc/netflix-prize-data
>
> The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
> 3 values associated to each ID:
>
> The format is as follows:
> *1:*
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> *2:*
> value1,value2, value3
> value1,value2, value3
> *3:*
> value1,value2, value3
> value1,value2, value3
> value1,value2, value3
> *4:*
> etc ...
>
> And I want to create a matrix where each line is of the form:
>
> ID value1, value2, value3
>
> Si "ID" needs to be duplicated - I could write a Perl script to convert
> this format to CSV, but I'm sure there's a simple R trick.
>
> Thanks for suggestions!
>
> Emmanuel
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)
Orcid ID: 0000-0002-7490-0066
Department of Evolutionary Biology and Environmental Studies
University of Zürich
Office Y34-J-74
Winterthurerstrasse 190
8075 Zürich
Switzerland
Office: +41 (0)44 635 47 64
Cell: +41 (0)78 630 66 57
email: Rainer.Krug using uzh.ch
Rainer using krugs.de
Skype: RMkrug
PGP: 0x0F52F982
[[alternative HTML version deleted]]
More information about the R-help
mailing list