[R] strange behavior when reading csv - line wraps

Martin Tomko martin.tomko at geo.uzh.ch
Fri May 29 21:15:39 CEST 2009


Dear All,
I am observing a strange behavior and searching the archives and help 
pages didn't help much.
I have a csv with a variable number of fields in each line.

I use
dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);

to read it in, and it works. But - some lines are long and 'wrap', or 
split and continue on the next line. So when I check the dim of the 
frame, they are not correct and I can see when I do a printout that the 
lines is split into two in the frame. I checked the input file and all 
is good.

an example of the input is:
37;2175168475;13;8.522729;47.19537;16366682 at N00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;

where the last values occurs on the next line in the data frame.

It does not have to be the last value, as in the follwong example, the 
word "kempten" starts the next line:
39;167757703;12;10.309295;47.724545;21903142 at N00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;

What could be the reason?

I ws thinking about solving the issue by using a different separator, 
that I would use for the first 7 fields and concatenating all of the 
remaining values into a single stirng value, but could not figure out 
how to do such a substitution in R. Unfortunately, on my system I cannot 
specify a range for sed...

Thanks for any help/pointers
Martin




More information about the R-help mailing list