[R] filter a tab delimited text file
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Sep 10 22:24:24 CEST 2010
On Fri, Sep 10, 2010 at 4:20 PM, Duke <duke.lists at gmx.com> wrote:
> On 9/10/10 2:49 PM, Gabor Grothendieck wrote:
>>
>> On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.lists at gmx.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have to filter a tab-delimited text file like below:
>>>
>>> "GeneNames" "value1" "value2" "log2(Fold_change)"
>>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
>>> normalized)> 4)"
>>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656
>>> TRUE
>>> ENSG00000177133 142 2 5.46771720082336 5.13545298955309
>>> FALSE
>>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982
>>> TRUE
>>> ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886
>>> FALSE
>>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731
>>> TRUE
>>>
>>> based on the last column (TRUE), and then write to a new text file,
>>> meaning
>>> I should get something like below:
>>>
>>> "GeneNames" "value1" "value2" "log2(Fold_change)"
>>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
>>> normalized)> 4)"
>>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656
>>> TRUE
>>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982
>>> TRUE
>>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731
>>> TRUE
>>>
>>> I used read.table and write.table but I am still not very satisfied with
>>> the
>>> results. Here is what I did:
>>>
>>> expFC<- read.table( "test.txt", header=T, sep="\t" )
>>> expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",]
>>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t"
>>> )
>>>
>>> Result:
>>>
>>> "GeneNames" "value1" "value2" "log2.Fold_change."
>>> "log2.Fold_change..normalized"
>>> "Signature.abs.log2.Fold_change..normalized....4."
>>> "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656
>>> TRUE
>>> "ENSG00000116285" 115 1669 -4.54130810709955
>>> -4.87357231836982
>>> TRUE
>>> "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731
>>> TRUE
>>>
>>> As you can see, there are two points:
>>>
>>> 1. The headers were altered. All the special characters were converted to
>>> dot (.).
>>> 2. The gene names (first column) were quoted (which were not in the
>>> original
>>> file).
>>>
>> This will copy input lines matching pattern as well as the header to
>> the output verbatim preserving all quotes, spacing, etc.
>>
>> myFilter<- function(infile, outfile, pattern = "TRUE$") {
>> L<- readLines(infile)
>> cat(L[1], "\n", file = outfile)
>> L2<- grep(pattern, L[-1], value = TRUE)
>> for(el in L2) cat(el, "\n", file = outfile, append = TRUE)
>> }
>>
>> # e.g.
>> myFilter("infile.txt", "outfile.txt")
>>
>
> I love this the best! Even it is not as simple as the bash one liner
> (system( "cat infile.txt | grep -v FALSE > outfile.txt", wait=TRUE )), but I
> am very happy to learn that R does have other similar functions as in bash.
> If there is a document or a list of all such functions, that would be
> excellent.
>
> Thanks Gabor,
>
Check out these help files:
help.search(keyword = "character", package = "base")
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list