[R] Value Lookup from File without Slurping

r at quantide.com r at quantide.com
Fri Jan 16 11:13:51 CET 2009


Something like this should work

library(R.utils)
out = numeric()
qr = c("AAC", "ATT")
n =countLines("test.txt")
file = file("test.txt", "r")
for (i in 1:n){
line = readLines(file, n = 1)
A = strsplit (line, split = " ")[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = " ")[[1]][2])
out = c(out, value)
}
}

You may want to improve execution speed by reading data in chunks 
instead of line by line. Code requires a little modification




Carlos J. Gil Bellosta wrote:
> On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
>   
>> Dear all,
>>
>> I have a repository file (let's call it repo.txt)
>>  that contain two columns like this:
>>
>> # tag  value
>> AAA    0.2
>> AAT    0.3
>> AAC   0.02
>> AAG   0.02
>> ATA    0.3
>> ATT   0.7
>>
>> Given another query vector
>>
>>     
>>> qr <- c("AAC", "ATT")
>>>       
>> I would like to find the corresponding value for each query above,
>> yielding:
>>
>> 0.02
>> 0.7
>>
>> However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
>> Is there any ways to do that?
>>
>> The reason I want to do that because repo.txt is very2 large size
>> (milions of lines,
>> with tag length > 30 bp),  and my PC memory is too small to keep it.
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>     
>
> Hello,
>
> You can always store your repo.txt into a database, say, SQLite, and
> select only the values you want via an SQL query.
>
> Thus, you will prevent loading the full file into memory.
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list