[R] filter a tab delimited text file

Duke duke.lists at gmx.com
Fri Sep 10 19:59:20 CEST 2010


  Hi Phil,

On 9/10/10 1:45 PM, Phil Spector wrote:
> Duke -
>    One possibility is to check the help files for the functions
> involved to see if there are options to control this behaviour.
> For example, the check.names= argument to read.table, or the quote= 
> argument to write.table.  How about

Yes, I did before posting question to the list. But somehow I missed (or 
misunderstood) the check.names option. As about quote=FALSE option for 
write.table, it does not work as I want, since all the headers are 
unquoted too.

>
> expFC <- read.table("test.txt", header=TRUE, sep="\t", check.names=FALSE)
> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
> write.table(expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, 
> sep="\t", quote=FALSE )

This works perfectly and solves the first issue. Thanks so much Phil.

D.

>
>                     - Phil Spector
>                      Statistical Computing Facility
>                      Department of Statistics
>                      UC Berkeley
>                      spector at stat.berkeley.edu
>
>
> On Fri, 10 Sep 2010, Duke wrote:
>
>> Hi all,
>>
>> I have to filter a tab-delimited text file like below:
>>
>> "GeneNames"    "value1"    "value2"    "log2(Fold_change)" 
>> "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
>> normalized) > 4)"
>> ENSG00000209350    4    35    -3.81131293562629    
>> -4.14357714689656    TRUE
>> ENSG00000177133    142    2    5.46771720082336    
>> 5.13545298955309    FALSE
>> ENSG00000116285    115    1669    -4.54130810709955    
>> -4.87357231836982 TRUE
>> ENSG00000009724    10    162    -4.69995182667858    
>> -5.03221603794886 FALSE
>> ENSG00000162460    3    31    -4.05126372834704    
>> -4.38352793961731    TRUE
>>
>> based on the last column (TRUE), and then write to a new text file, 
>> meaning I should get something like below:
>>
>> "GeneNames"    "value1"    "value2"    "log2(Fold_change)" 
>> "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
>> normalized) > 4)"
>> ENSG00000209350    4    35    -3.81131293562629    
>> -4.14357714689656    TRUE
>> ENSG00000116285    115    1669    -4.54130810709955    
>> -4.87357231836982 TRUE
>> ENSG00000162460    3    31    -4.05126372834704    
>> -4.38352793961731    TRUE
>>
>> I used read.table and write.table but I am still not very satisfied 
>> with the results. Here is what I did:
>>
>> expFC <- read.table( "test.txt", header=T, sep="\t" )
>> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, 
>> sep="\t" )
>>
>> Result:
>>
>> "GeneNames"    "value1"    "value2"    "log2.Fold_change." 
>> "log2.Fold_change..normalized" 
>> "Signature.abs.log2.Fold_change..normalized....4."
>> "ENSG00000209350"    4    35    -3.81131293562629    
>> -4.14357714689656 TRUE
>> "ENSG00000116285"    115    1669    -4.54130810709955    
>> -4.87357231836982 TRUE
>> "ENSG00000162460"    3    31    -4.05126372834704    
>> -4.38352793961731 TRUE
>>
>> As you can see, there are two points:
>>
>> 1. The headers were altered. All the special characters were 
>> converted to dot (.).
>> 2. The gene names (first column) were quoted (which were not in the 
>> original file).
>>
>> The second point is not very annoying, but the first one is. How do I 
>> get exact the headers like the original file?
>>
>> Thanks,
>>
>> D.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list