[R] strange behavior when reading csv - line wraps
Martin Tomko
martin.tomko at geo.uzh.ch
Fri May 29 21:15:39 CEST 2009
Dear All,
I am observing a strange behavior and searching the archives and help
pages didn't help much.
I have a csv with a variable number of fields in each line.
I use
dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);
to read it in, and it works. But - some lines are long and 'wrap', or
split and continue on the next line. So when I check the dim of the
frame, they are not correct and I can see when I do a printout that the
lines is split into two in the frame. I checked the input file and all
is good.
an example of the input is:
37;2175168475;13;8.522729;47.19537;16366682 at N00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
where the last values occurs on the next line in the data frame.
It does not have to be the last value, as in the follwong example, the
word "kempten" starts the next line:
39;167757703;12;10.309295;47.724545;21903142 at N00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
What could be the reason?
I ws thinking about solving the issue by using a different separator,
that I would use for the first 7 fields and concatenating all of the
remaining values into a single stirng value, but could not figure out
how to do such a substitution in R. Unfortunately, on my system I cannot
specify a range for sed...
Thanks for any help/pointers
Martin
More information about the R-help
mailing list