[R] the quote problem with readLines()

Dongyan Song yzhskdls at hotmail.com
Wed Mar 18 14:58:17 CET 2009


Hi,

Thank you for your concern! 

The file has 136,047,472 lines, with one value in each line, and is 1.7G in
size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error
message is Error: cannot allocate vector of size 2.0 Gb. And the worst thing
is even if I read all the data into R after I truncate the numbers'
precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these numbers,
for example, I cannot do ks.test, histogram, kernel density estimator, which
I want to do with these numbers. And after I input commands above, computer
also give error messages like Error: cannot allocate vector of size 809.1
Mb. I can read a half of file, but I want to know the overall distribution
of those numbers, and values in this file is not ordered, and it is not
quite easy to random pick up some numbers or sort them. 

Is these information enough? Thank you again!

Best,
Dongyan



jholtman wrote:
> 
> readLines is doing exactly what you are asking:
> 
> Value
> A character vector of length the number of lines read.
> 
> You still have to convert the character strings to numeric.  Exactly
> how large is "quite large"?  What system are you running on?  How much
> memory do you have?  What is the error message that you are getting?
> Exactly what does your file look like?  Have you tried reading in
> portions of the file?  How big will it be if you could read it in?
> Will it take up more than 25% of real memory?  There is still some
> information you need to provide so an assessment can be made.
> 
> On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhskdls at hotmail.com>
> wrote:
>>
>> Dear all,
>>
>> I read a file with all numbers with readLines function, as below,
>>> f <- file("data.txt")
>>> a <- readLines(f)
>> but all the values in a are in format "....", and I cannot do the
>> calculation with them since they are not numeric. I wonder how should I
>> skip
>> those quotes, thank you for help!
>> I have to use readLines function instead of scan, read.table or matrix,
>> because the size of file is quite large, and other function cannot
>> allocate
>> enough space/memory to read the input file.
>>
>> Best,
>> Dongyan
>> --
>> View this message in context:
>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 


-----
Dongyan Song, Msc
Medical informatics, Uppsala University, Sweden
-- 
View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list