[R] reading a text file, one line at a time
jim holtman
jholtman at gmail.com
Thu Aug 19 12:25:22 CEST 2010
Here is how I would do it to just do character substitution on the data:
> inFile <- textConnection(" V1 V2 V3 V4 V5
+ 1 1 b b a -0.4990719
+ 2 2 b a a 1.5134101
+ 3 3 a b b 1.9375467
+ 4 4 a a b 0.3310612
+ 5 5 a b a 0.2807520
+ 6 6 a a b 0.9646351
+ 7 7 b a b 0.6243979
+ 8 8 a b a -0.8076008
+ 9 9 a b b -1.7645273
+ 10 10 b b a 0.5460802
+ 11 11 c c b 12.3000000")
> output <- NULL # initialize output file (just a vector in this case
> while(length(input <- readLines(inFile, n=3)) > 0){
+ # replace 'b' with 'z'
+ for (i in seq_along(input)){
+ input[i] <- gsub('b', 'z', input[i])
+ }
+ output <- c(output, input) # collect the output
+ }
> close(inFile)
> print(cbind(output)) # show converted data
output
[1,] " V1 V2 V3 V4 V5"
[2,] "1 1 z z a -0.4990719"
[3,] "2 2 z a a 1.5134101"
[4,] "3 3 a z z 1.9375467"
[5,] "4 4 a a z 0.3310612"
[6,] "5 5 a z a 0.2807520"
[7,] "6 6 a a z 0.9646351"
[8,] "7 7 z a z 0.6243979"
[9,] "8 8 a z a -0.8076008"
[10,] "9 9 a z z -1.7645273"
[11,] "10 10 z z a 0.5460802"
[12,] "11 11 c c z 12.3000000"
>
On Wed, Aug 18, 2010 at 10:51 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
> Hi Jim,
>
> I was trying to use your template without success. With the toy data
> below, could
> you explain how to use this template to change all "b"s to "z"s --
> just as an exercise, reading
> in 3 lines at a time. I need to use this strategy for a larger
> problem, but I haven't
> been able to get the basics working.
>
> Thanks,
>
> Juliet
>
> myData <- structure(list(V1 = 1:11, V2 = structure(c(2L, 2L, 1L, 1L, 1L,
> 1L, 2L, 1L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"),
> V3 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
> 3L), .Label = c("a", "b", "c"), class = "factor"), V4 = structure(c(1L,
> 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("a",
> "b"), class = "factor"), V5 = c(-0.499071939558026, 1.51341011554134,
> 1.93754671209923, 0.331061227463955, 0.280752001959284, 0.964635079229074,
> 0.624397908891502, -0.807600774484419, -1.76452730888732,
> 0.546080229326458, 12.3)), .Names = c("V1", "V2", "V3", "V4",
> "V5"), class = "data.frame", row.names = c(NA, -11L))
>
> On Sun, Aug 15, 2010 at 1:06 PM, jim holtman <jholtman at gmail.com> wrote:
>> For efficiency of processing, look at reading in several
>> hundred/thousand lines at a time. One line read/write will probably
>> spend most of the time in the system calls to do the I/O and will take
>> a long time. So do something like this:
>>
>> con <- file('yourInputFile', 'r')
>> outfile <- file('yourOutputFile', 'w')
>> while (length(input <- readLines(con, n=1000) > 0){
>> for (i in 1:length(input)){
>> ......your one line at a time processing
>> }
>> writeLines(output, con=outfile)
>> }
>>
>> On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp.
>> <walt at dataanalyticscorp.com> wrote:
>>> Hi,
>>>
>>> I have an upcoming project that will involve a large text file. I want to
>>>
>>> 1. read the file into R one line at a time
>>> 2. do some string manipulations on the line
>>> 3. write the line to another text file.
>>>
>>> I can handle the last two parts. Scan and read.table seem to read the whole
>>> file in at once. Since this is a very large file (several hundred thousand
>>> lines), this is not practical. Hence the idea of reading one line at at
>>> time. The question is, can R read one line at a time? If so, how? Any
>>> suggestions are appreciated.
>>>
>>> Thanks,
>>>
>>> Walt
>>>
>>> ________________________
>>>
>>> Walter R. Paczkowski, Ph.D.
>>> Data Analytics Corp.
>>> 44 Hamilton Lane
>>> Plainsboro, NJ 08536
>>> ________________________
>>> (V) 609-936-8999
>>> (F) 609-936-3733
>>> walt at dataanalyticscorp.com
>>> www.dataanalyticscorp.com
>>>
>>> _____________________________________________________
>>>
>>>
>>> --
>>> ________________________
>>>
>>> Walter R. Paczkowski, Ph.D.
>>> Data Analytics Corp.
>>> 44 Hamilton Lane
>>> Plainsboro, NJ 08536
>>> ________________________
>>> (V) 609-936-8999
>>> (F) 609-936-3733
>>> walt at dataanalyticscorp.com
>>> www.dataanalyticscorp.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list