[R] strange behavior when reading csv - line wraps

Martin Tomko martin.tomko at geo.uzh.ch
Sat May 30 10:32:11 CEST 2009


Jim,
the two lines I put in are the actual problematic input lines.
In these examples, there are no quotes nor # signs, although I have no 
means to make sure they do not occur in the inputs (any hints how I 
could deal with that?).
I am trying to avoid as much pre-processing outside R as possible, and I 
have to process about 500 files with up to 3000 records each, so I need 
a more or less automated/batch solution. - so any string substitution 
will have to occur in R. But for the moment, I do not see a reaason for 
substitution, and the wrapping still occurs.

Cheers
Martin



jim holtman wrote:
> You need to supply the actual input line so we can see what is 
> happening.  Are you sure you do not have unbalanced quotes in your 
> input (try quote='') or do you have comment characters ("#") in your 
> input?
>
> On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <martin.tomko at geo.uzh.ch 
> <mailto:martin.tomko at geo.uzh.ch>> wrote:
>
>     Dear All,
>     I am observing a strange behavior and searching the archives and
>     help pages didn't help much.
>     I have a csv with a variable number of fields in each line.
>
>     I use
>     dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);
>
>     to read it in, and it works. But - some lines are long and 'wrap',
>     or split and continue on the next line. So when I check the dim of
>     the frame, they are not correct and I can see when I do a printout
>     that the lines is split into two in the frame. I checked the input
>     file and all is good.
>
>     an example of the input is:
>     37;2175168475;13;8.522729;47.19537;16366682 at N00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
>
>     where the last values occurs on the next line in the data frame.
>
>     It does not have to be the last value, as in the follwong example,
>     the word "kempten" starts the next line:
>     39;167757703;12;10.309295;47.724545;21903142 at N00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
>
>     What could be the reason?
>
>     I ws thinking about solving the issue by using a different
>     separator, that I would use for the first 7 fields and
>     concatenating all of the remaining values into a single stirng
>     value, but could not figure out how to do such a substitution in
>     R. Unfortunately, on my system I cannot specify a range for sed...
>
>     Thanks for any help/pointers
>     Martin
>
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.r-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?




More information about the R-help mailing list