[R] Analyzing large files faster

Wed Jun 13 00:06:57 CEST 2012

Hello,

The trick is to use index vectors. They allow us to do without loops.

Try the following.

muscle <- read.table(text='
"ID"	            "adj.P.Val"	"logFC"	       "Gene.symbol"
"1419156_at"	"5.32e-12"	"2.6462565"	"Sox4"
"1433575_at"	"5.32e-12"	"3.9417089"	"Sox4"
"1428942_at"	"2.64e-11"	"3.9163618"	"Mt2"
"1454699_at"	"2.69e-10"	"1.8654677"	"LOC100047324///Sesn1"
"1416926_at"	"3.19e-10"	"2.172342"	"Trp53inp1"
"1422557_s_at"	"1.58e-09"	"2.9569254"	"Mt1"
', header=TRUE, stringsAsFactors=FALSE)

muscle

p_thresh = 6.51e-06

# Create index vectors
gsym <- muscle$Gene.symbol != ""
this_pval <- muscle$adj.P.Val <= p_thresh
this_Ma <- muscle$logFC > -1
this_Mb <- muscle$logFC < 1

# Use them
downregulated_list <- muscle$Gene.symbol[gsym & !this_Ma & this_pval]
upregulated_list <- muscle$Gene.symbol[gsym & !this_Mb & this_pval]
nochange <- muscle$Gene.symbol[gsym & this_Ma & this_Mb]

# See the result [ Maybe with head() ]
upregulated_list
downregulated_list
nochange

Hope this helps,

Rui Barradas
Em 12-06-2012 21:55, mousy0815 escreveu:
> upregulated_list = c()
> downregulated_list = c()
> nochange = c()
> p_thresh = 6.51e-06
> x=1
>
> while (x <= nrow(muscle)) {
> this_pval = muscle[x,"adj.P.Val"]
> this_M = muscle[x, "logFC"]