[R] Fwd: Reading very large text files into R
Nick Wray
n|ckmwr@y @end|ng |rom gm@||@com
Thu Sep 29 16:51:06 CEST 2022
---------- Forwarded message ---------
From: Nick Wray <nickmwray using gmail.com>
Date: Thu, 29 Sept 2022 at 15:32
Subject: Re: [R] Reading very large text files into R
To: Ben Tupper <btupper using bigelow.org>
Hi Ben
Beneath is an example of the text (also in an attachment) and it's the "B",
of which there are quite a few scattered throughout the text doc which
causes the reading in error message (btw I don't need the "RAIN" column or
the 1's after it or the last four elements). I have also attached the
snippet as text file
1980-01-01 10:00, 225620, RAIN, 1, 1, WAHRAIN, 5091, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 226918, RAIN, 1, 1, WAHRAIN, 5124, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 228562, RAIN, 1, 1, WAHRAIN, 491, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 231581, RAIN, 1, 1, WAHRAIN, 5213, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 232671, RAIN, 1, 1, WAHRAIN, 487, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 232913, RAIN, 1, 1, WAHRAIN, 5243, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 234362, RAIN, 1, 1, WAHRAIN, 5265, 1001, 0, , 10009, 0, ,
, B
1980-01-01 10:00, 234682, RAIN, 1, 1, WAHRAIN, 5271, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 235389, RAIN, 1, 1, WAHRAIN, 5279, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 236466, RAIN, 1, 1, WAHRAIN, 497, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 243350, RAIN, 1, 1, SREW, 484, 1001, 0, , 9, 0, , ,
1980-01-01 10:00, 243350, RAIN, 1, 1, WAHRAIN, 484, 1001, 0, 0, 9, 9, , ,
Thanks Nick
On Thu, 29 Sept 2022 at 15:12, Ben Tupper <btupper using bigelow.org> wrote:
> Hi Nick,
>
> It's hard to know without seeing at least a snippet of the data.
> Could you do the following and paste the result into a plain text
> email? If you don't set your email client to plain text (from rich
> text or html) then we are apt to see a jumble of output on our email
> clients.
>
>
> ## start
> x <- readLines(filename, n = 20)
> cat(x, sep = "\n")
> ## end
>
> Cheers,
> Ben
>
>
> On Thu, Sep 29, 2022 at 9:54 AM Nick Wray <nickmwray using gmail.com> wrote:
> >
> > Hello I may be offending the R purists with this question but it is
> > linked to R, as will become clear. I have very large data sets from the
> UK
> > Met Office in notepad form. Unfortunately, I can’t read them directly
> > into R because, for some reason, although most lines in the text doc
> > consist of 15 elements, every so often there is a sixteenth one and R
> > doesn’t like this and gives me an error message because it has assumed
> that
> > every line has 15 elements and doesn’t like finding one with more. I
> have
> > tried playing around with the text document, inserting an extra element
> > into the top line etc, but to no avail.
> >
> > Also unfortunately you need access permission from the Met Office to get
> > the files in question so this link probably won’t work:
> >
> > https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
> >
> > So what I have done is simply to copy and paste the text docs into excel
> > csv and then read them in, which is time-consuming but works. However
> the
> > later datasets are over the excel limit of 1048576 lines. I can paste in
> > the first 1048576 lines but then trying to isolate the remainder of the
> > text doc to paste it into a second csv doc is proving v difficult – the
> > only way I have found is to scroll down by hand and that’s taking ages.
> I
> > cannot find another way of editing the notepad text doc to get rid of the
> > part which I have already copied and pasted.
> >
> > Can anyone help with a)ideally being able to simply read the text tables
> > into R or b)suggest a way of editing out the bits of the text file I
> have
> > already pasted in without laborious scrolling?
> >
> > Thanks Nick Wray
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Ben Tupper (he/him)
> Bigelow Laboratory for Ocean Science
> East Boothbay, Maine
> http://www.bigelow.org/
> https://eco.bigelow.org
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample text.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20220929/feb43583/attachment.txt>
More information about the R-help
mailing list