[R] Huge Dataset Dates Span two Lines
David Winsemius
dwinsemius at comcast.net
Thu Jan 8 22:41:22 CET 2015
On Jan 8, 2015, at 10:20 AM, DVL wrote:
> I'm trying to import a many gigabyte .txt file to analyze. It is asterisk
> delimited. I'm having an issue with the date field in the dataset. In the
> first 165 lines dates are listed as :
> YYYY-MM-DD HH:MM:SS
>
> Then on the 166th line and in other places the date spans two lines:
> YYYY-MM-DD
> HH:MM:SS
>
> This causes a problem because R thinks it has reached the end of a row in
> the table. How can I solve this?
It would probably be easiest to edit the file in a text editor. I suppose you could also read the file in with readLines() and do the work all in R but that sounds a bit more painful than option 1 to my reading. If the problems are only those exactly as you describe, this could be an untested outline of a solution:
dat <- readLines("/pat/fil.ext")
marks <- nchar(dat) == 10
#or
marks <- grepl("[*]", dat)
# append shortened lines after broken fragments
dat[ marks ] <- paste(dat[ marks ], dat[ c(head(marks,-1), FALSE) ] )
final <- dat[ ! c(head(marks,-1), FALSE) ] # remove shorter lines
> View this message in context: http://r.789695.n4.nabble.com/Huge-Dataset-Dates-Span-two-Lines-tp4701523.html
> Sent from the R help mailing list archive at Nabble.com.
>
Nabble is not the Rhelp Archive and it also suppresses these message which you should be sure to read:
*______________________________________________
*R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
*https://stat.ethz.ch/mailman/listinfo/r-help
*PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
*and provide commented, minimal, self-contained, reproducible code.
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list