[R] filter a tab delimited text file

Phil Spector spector at stat.berkeley.edu
Fri Sep 10 19:45:21 CEST 2010


Duke -
    One possibility is to check the help files for the functions
involved to see if there are options to control this behaviour.
For example, the check.names= argument to read.table, or the 
quote= argument to write.table.  How about

expFC <- read.table("test.txt", header=TRUE, sep="\t", check.names=FALSE)
expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
write.table(expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t", quote=FALSE )

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Fri, 10 Sep 2010, Duke wrote:

> Hi all,
>
> I have to filter a tab-delimited text file like below:
>
> "GeneNames"    "value1"    "value2"    "log2(Fold_change)" 
> "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
> normalized) > 4)"
> ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
> ENSG00000177133    142    2    5.46771720082336    5.13545298955309    FALSE
> ENSG00000116285    115    1669    -4.54130810709955    -4.87357231836982 
> TRUE
> ENSG00000009724    10    162    -4.69995182667858    -5.03221603794886 
> FALSE
> ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
>
> based on the last column (TRUE), and then write to a new text file, meaning I 
> should get something like below:
>
> "GeneNames"    "value1"    "value2"    "log2(Fold_change)" 
> "log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
> normalized) > 4)"
> ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
> ENSG00000116285    115    1669    -4.54130810709955    -4.87357231836982 
> TRUE
> ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
>
> I used read.table and write.table but I am still not very satisfied with the 
> results. Here is what I did:
>
> expFC <- read.table( "test.txt", header=T, sep="\t" )
> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" )
>
> Result:
>
> "GeneNames"    "value1"    "value2"    "log2.Fold_change." 
> "log2.Fold_change..normalized" 
> "Signature.abs.log2.Fold_change..normalized....4."
> "ENSG00000209350"    4    35    -3.81131293562629    -4.14357714689656 
> TRUE
> "ENSG00000116285"    115    1669    -4.54130810709955    -4.87357231836982 
> TRUE
> "ENSG00000162460"    3    31    -4.05126372834704    -4.38352793961731 
> TRUE
>
> As you can see, there are two points:
>
> 1. The headers were altered. All the special characters were converted to dot 
> (.).
> 2. The gene names (first column) were quoted (which were not in the original 
> file).
>
> The second point is not very annoying, but the first one is. How do I get 
> exact the headers like the original file?
>
> Thanks,
>
> D.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list