[R] Incremental ReadLines
Freds
frederiklang at gmail.com
Wed Apr 13 19:57:58 CEST 2011
Hi there,
I am having a similar problem with reading in a large text file with around
550.000 observations with each 10 to 100 lines of description. I am trying
to parse it in R but I have troubles with the size of the file. It seems
like it is slowing down dramatically at some point. I would be happy for any
suggestions. Here is my code, which works fine when I am doing a subsample
of my dataset.
#Defining datasource
file <- "filename.txt"
#Creating placeholder for data and assigning column names
data <- data.frame(Id=NA)
#Starting by case = 0
case <- 0
#Opening a connection to data
input <- file(file, "rt")
#Going through cases
repeat {
line <- readLines(input, n=1)
if (length(line)==0) break
if (length(grep("Id:",line)) != 0) {
case <- case + 1 ; data[case,] <-NA
split_line <- strsplit(line,"Id:")
data[case,1] <- as.numeric(split_line[[1]][2])
}
}
#Closing connection
close(input)
#Saving dataframe
write.csv(data,'data.csv')
Kind regards,
Frederik
--
View this message in context: http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list