[R] End of line marker?
jonas garcia
garcia.jonas80 at googlemail.com
Fri Mar 5 03:47:00 CET 2010
When I opened the file with a hex-editor, the problematic character turned
out to be “1a”
I am attaching a sample DAT file with 3 lines (the second line is the one
with the undesirable character).
The furthest I could get was through readBin:
> tmp<- readBin("new.dat", what = "raw", n=100000000)
[1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31 31 34 2c
2d 32 30 2e 38 33 36 2c 31
[33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a 33 35 3a
33 35 2c 20 34 34 33 32 2c
[65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e 38 2c 1a
30 2e 38 31 2c 31 31 35 37
[97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37 2e 31 31
34 2c 2d 32 30 2e 38 33 36
[129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38
> tmp[87]
[1] 1a
The idea now is as Jim suggested, replace “1a” by (for example) “20” in the
raw format and write the file back with
writeBin(tmp, "new2.dat")
Can I use gsub? How can I perform this operation without messing around with
the raw format?
Thanks
J
On Thu, Mar 4, 2010 at 8:35 PM, jim holtman <jholtman at gmail.com> wrote:
> Have you considered reading the file in a binary/raw, finding the
> offending character and replacing it with a blank (or whatever and
> then writing the file back out). You can then probably process it
> using read.table.;
>
> On Thu, Mar 4, 2010 at 12:50 PM, jonas garcia
> <garcia.jonas80 at googlemail.com> wrote:
> > Thank you so much for your reply.
> >
> >
> >
> > I can identify the characters very easily in a couple of files. The
> reason I
> > am worried is that I have thousands of files to read in. The files were
> > produced in a very old MS-DOS software that records information on
> > oceanographic data and geographic position during a survey.
> >
> >
> >
> > My main goal is read all these files into R for further analysis. Most of
> > the files are cleared of these EOL markers but some are not. I only
> noticed
> > the problem by chance when I was looking and comparing one of them. I
> wonder
> > if I can solve this problem using R, without having to go for text
> editors
> > separately.
> >
> >
> >
> > Help on this would be much appreciated.
> >
> > Thanks again
> >
> >
> >
> > J
> >
> >
> > On 3/4/10, David Winsemius <dwinsemius at comcast.net> wrote:
> >>
> >>
> >> On Mar 3, 2010, at 2:22 PM, jonas garcia wrote:
> >>
> >> Dear R users,
> >>>
> >>> I am trying to read a huge file in R. For some reason, only a part of
> the
> >>> file is read. When I further investigated, I found that in one of my
> >>> non-numeric columns, there is one odd character responsible for this,
> >>> which
> >>> I reproduce bellow:
> >>> In case you cannot see it, it looks like a right arrow, but it is not
> the
> >>> one you get from microsoft word in menu "insert symbol".
> >>>
> >>> I think my dat file is broken and that funny character is an EOL marker
> >>> that
> >>> makes R not read the rest of the file. I am sure the character is there
> by
> >>> chance but I fear that it might be present in some other big files I
> have
> >>> to
> >>> work with as well. So, is there any clever way to remove this
> inconvenient
> >>> character in R avoiding having to edit the file in notepad and remove
> it
> >>> manually?
> >>>
> >>> Code I am using:
> >>>
> >>> read.csv("new3.dat", header=F)
> >>>
> >>> Warning message:
> >>> In read.table(file = file, header = header, sep = sep, quote = quote,
> :
> >>> incomplete final line found by readTableHeader on 'new3.dat'
> >>>
> >>
> >> I think you should identify the offending line by using the count.fields
> >> function and fix it with an editor.
> >>
> >>
> >> --
> >> David
> >>
> >>>
> >>> I am working with R 2.10.1 in windows XP.
> >>>
> >>> Thanks in advance
> >>>
> >>> Jonas
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> <http://www.r-project.org/posting-guide.html>
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> David Winsemius, MD
> >> Heritage Laboratories
> >> West Hartford, CT
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
More information about the R-help
mailing list