[R] filter a tab delimited text file
Duke
duke.lists at gmx.com
Fri Sep 10 19:59:20 CEST 2010
Hi Phil,
On 9/10/10 1:45 PM, Phil Spector wrote:
> Duke -
> One possibility is to check the help files for the functions
> involved to see if there are options to control this behaviour.
> For example, the check.names= argument to read.table, or the quote=
> argument to write.table. How about
Yes, I did before posting question to the list. But somehow I missed (or
misunderstood) the check.names option. As about quote=FALSE option for
write.table, it does not work as I want, since all the headers are
unquoted too.
>
> expFC <- read.table("test.txt", header=TRUE, sep="\t", check.names=FALSE)
> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
> write.table(expFC.TRUE, file="test_TRUE.txt", row.names=FALSE,
> sep="\t", quote=FALSE )
This works perfectly and solves the first issue. Thanks so much Phil.
D.
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spector at stat.berkeley.edu
>
>
> On Fri, 10 Sep 2010, Duke wrote:
>
>> Hi all,
>>
>> I have to filter a tab-delimited text file like below:
>>
>> "GeneNames" "value1" "value2" "log2(Fold_change)"
>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
>> normalized) > 4)"
>> ENSG00000209350 4 35 -3.81131293562629
>> -4.14357714689656 TRUE
>> ENSG00000177133 142 2 5.46771720082336
>> 5.13545298955309 FALSE
>> ENSG00000116285 115 1669 -4.54130810709955
>> -4.87357231836982 TRUE
>> ENSG00000009724 10 162 -4.69995182667858
>> -5.03221603794886 FALSE
>> ENSG00000162460 3 31 -4.05126372834704
>> -4.38352793961731 TRUE
>>
>> based on the last column (TRUE), and then write to a new text file,
>> meaning I should get something like below:
>>
>> "GeneNames" "value1" "value2" "log2(Fold_change)"
>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change)
>> normalized) > 4)"
>> ENSG00000209350 4 35 -3.81131293562629
>> -4.14357714689656 TRUE
>> ENSG00000116285 115 1669 -4.54130810709955
>> -4.87357231836982 TRUE
>> ENSG00000162460 3 31 -4.05126372834704
>> -4.38352793961731 TRUE
>>
>> I used read.table and write.table but I am still not very satisfied
>> with the results. Here is what I did:
>>
>> expFC <- read.table( "test.txt", header=T, sep="\t" )
>> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE,
>> sep="\t" )
>>
>> Result:
>>
>> "GeneNames" "value1" "value2" "log2.Fold_change."
>> "log2.Fold_change..normalized"
>> "Signature.abs.log2.Fold_change..normalized....4."
>> "ENSG00000209350" 4 35 -3.81131293562629
>> -4.14357714689656 TRUE
>> "ENSG00000116285" 115 1669 -4.54130810709955
>> -4.87357231836982 TRUE
>> "ENSG00000162460" 3 31 -4.05126372834704
>> -4.38352793961731 TRUE
>>
>> As you can see, there are two points:
>>
>> 1. The headers were altered. All the special characters were
>> converted to dot (.).
>> 2. The gene names (first column) were quoted (which were not in the
>> original file).
>>
>> The second point is not very annoying, but the first one is. How do I
>> get exact the headers like the original file?
>>
>> Thanks,
>>
>> D.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list