[R] the quote problem with readLines()

Dongyan Song yzhskdls at hotmail.com
Thu Mar 19 11:22:44 CET 2009


Hi Jim,

Finally, I got the result I want, thanks a lot!

Best,
Dongyan


jholtman wrote:
> 
> Check out this reference:
> 
> http://tolstoy.newcastle.edu.au/R/e2/help/07/02/9709.html
> 
> 
> 
> On Wed, Mar 18, 2009 at 11:16 AM, Dongyan Song <yzhskdls at hotmail.com>
> wrote:
>>
>> Hi Jim,
>>
>> Thank you very much! I will try to sample them then.
>>
>> Best,
>> Dongyan
>>
>>
>> jholtman wrote:
>>>
>>> The amount of data that you want to read in (136M numbers) will
>>> require about 1GB of memory (8 bytes per number for floating point -
>>> truncation does not reduce this number of bytes).  So if you want to
>>> read it all in, then find a 64-bit version of R and probably at least
>>> 4GB of memory for your process.  A 32-bit version might have just
>>> enough space if you can allocate all the 4GB of memory to that
>>> process.
>>>
>>> So if you want to have it all in memory, invest in a larger computer.
>>> If you want to run on the system you have, then you will probably have
>>> to sample your data so that you can get a portion that will fit in
>>> memory to run your test, or see if there is a way of processing
>>> portions of the file and then combining for a final result.
>>> On Wed, Mar 18, 2009 at 9:58 AM, Dongyan Song <yzhskdls at hotmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thank you for your concern!
>>>>
>>>> The file has 136,047,472 lines, with one value in each line, and is
>>>> 1.7G
>>>> in
>>>> size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error
>>>> message is Error: cannot allocate vector of size 2.0 Gb. And the worst
>>>> thing
>>>> is even if I read all the data into R after I truncate the numbers'
>>>> precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these
>>>> numbers,
>>>> for example, I cannot do ks.test, histogram, kernel density estimator,
>>>> which
>>>> I want to do with these numbers. And after I input commands above,
>>>> computer
>>>> also give error messages like Error: cannot allocate vector of size
>>>> 809.1
>>>> Mb. I can read a half of file, but I want to know the overall
>>>> distribution
>>>> of those numbers, and values in this file is not ordered, and it is not
>>>> quite easy to random pick up some numbers or sort them.
>>>>
>>>> Is these information enough? Thank you again!
>>>>
>>>> Best,
>>>> Dongyan
>>>>
>>>>
>>>>
>>>> jholtman wrote:
>>>>>
>>>>> readLines is doing exactly what you are asking:
>>>>>
>>>>> Value
>>>>> A character vector of length the number of lines read.
>>>>>
>>>>> You still have to convert the character strings to numeric.  Exactly
>>>>> how large is "quite large"?  What system are you running on?  How much
>>>>> memory do you have?  What is the error message that you are getting?
>>>>> Exactly what does your file look like?  Have you tried reading in
>>>>> portions of the file?  How big will it be if you could read it in?
>>>>> Will it take up more than 25% of real memory?  There is still some
>>>>> information you need to provide so an assessment can be made.
>>>>>
>>>>> On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhskdls at hotmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I read a file with all numbers with readLines function, as below,
>>>>>>> f <- file("data.txt")
>>>>>>> a <- readLines(f)
>>>>>> but all the values in a are in format "....", and I cannot do the
>>>>>> calculation with them since they are not numeric. I wonder how should
>>>>>> I
>>>>>> skip
>>>>>> those quotes, thank you for help!
>>>>>> I have to use readLines function instead of scan, read.table or
>>>>>> matrix,
>>>>>> because the size of file is quite large, and other function cannot
>>>>>> allocate
>>>>>> enough space/memory to read the input file.
>>>>>>
>>>>>> Best,
>>>>>> Dongyan
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html
>>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jim Holtman
>>>>> Cincinnati, OH
>>>>> +1 513 646 9390
>>>>>
>>>>> What is the problem that you are trying to solve?
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>>
>>>> -----
>>>> Dongyan Song, Msc
>>>> Medical informatics, Uppsala University, Sweden
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>> -----
>> Dongyan Song, Msc
>> Medical informatics, Uppsala University, Sweden
>> --
>> View this message in context:
>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22581029.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 


-----
Dongyan Song, Msc
Medical informatics, Uppsala University, Sweden
-- 
View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22597197.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list