[R] Value Lookup from File without Slurping
r at quantide.com
r at quantide.com
Fri Jan 16 11:13:51 CET 2009
Something like this should work
library(R.utils)
out = numeric()
qr = c("AAC", "ATT")
n =countLines("test.txt")
file = file("test.txt", "r")
for (i in 1:n){
line = readLines(file, n = 1)
A = strsplit (line, split = " ")[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = " ")[[1]][2])
out = c(out, value)
}
}
You may want to improve execution speed by reading data in chunks
instead of line by line. Code requires a little modification
Carlos J. Gil Bellosta wrote:
> On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
>
>> Dear all,
>>
>> I have a repository file (let's call it repo.txt)
>> that contain two columns like this:
>>
>> # tag value
>> AAA 0.2
>> AAT 0.3
>> AAC 0.02
>> AAG 0.02
>> ATA 0.3
>> ATT 0.7
>>
>> Given another query vector
>>
>>
>>> qr <- c("AAC", "ATT")
>>>
>> I would like to find the corresponding value for each query above,
>> yielding:
>>
>> 0.02
>> 0.7
>>
>> However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
>> Is there any ways to do that?
>>
>> The reason I want to do that because repo.txt is very2 large size
>> (milions of lines,
>> with tag length > 30 bp), and my PC memory is too small to keep it.
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>
> Hello,
>
> You can always store your repo.txt into a database, say, SQLite, and
> select only the values you want via an SQL query.
>
> Thus, you will prevent loading the full file into memory.
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list