[R] how to "singlify" entries

Charles Plessy charles-r-nospam at plessy.org
Mon May 30 16:54:59 CEST 2005


On Mon, May 30, 2005 at 09:09:27AM -0400, Gabor Grothendieck wrote :

> Try using reshape, e.g. if dd is your data frame:
> 
> reshape(dd, dir = "wide", idvar = "F1", timevar = "F2", 
>     varying = list(c("VX","VY")))


Thank you very much, and to Petr Pikal too. Reshape is exactly what I had forgotten.

Now the bad news is that I have simplified my example ; I am in a
slightly more complex situation :

I have three factors, and one value

> count_per_tc[1:10,]
   rna   lib           tc x
1  CAB 114BA T01F00380F47 1
2  CAE 114BB T01F00381273 1
3  CAJ 114BA T01F0048F6D1 1
4  CAB 114BC T01F0048F6D1 1
5  CAB 114BA T01F00498689 2
6  CAC 114BA T01F00498689 1
7  CAE 114BA T01F00498689 2
8  CAG 114BA T01F00498689 2
9  CAH 114BA T01F00498689 1
10 CAI 114BA T01F00498689 2

I would like a data frame where I have the value of x for each combination of
"rna" and "lib", for each "tc"

> reshape(count_per_tc[1:10,], direction="wide", timevar="tc", idvar=c("rna","lib"))
   rna   lib x.T01F00380F47 x.T01F00381273 x.T01F0048F6D1 x.T01F00498689
1  CAB 114BA              1             NA             NA              2
2  CAE 114BB             NA              1             NA             NA
3  CAJ 114BA             NA             NA              1             NA
4  CAB 114BC             NA             NA              1             NA
6  CAC 114BA             NA             NA             NA              1
7  CAE 114BA             NA             NA             NA              2
8  CAG 114BA             NA             NA             NA              2
9  CAH 114BA             NA             NA             NA              1
10 CAI 114BA             NA             NA             NA              2

oops, the other way round :

> t(reshape(count_per_tc[1:10,], direction="wide", timevar="tc", idvar=c("rna","lib")))
               1       2       3       4       6       7       8       9       10     
rna            "CAB"   "CAE"   "CAJ"   "CAB"   "CAC"   "CAE"   "CAG"   "CAH"   "CAI"  
lib            "114BA" "114BB" "114BA" "114BC" "114BA" "114BA" "114BA" "114BA" "114BA"
x.T01F00380F47 " 1"    NA      NA      NA      NA      NA      NA      NA      NA     
x.T01F00381273 NA      " 1"    NA      NA      NA      NA      NA      NA      NA     
x.T01F0048F6D1 NA      NA      " 1"    " 1"    NA      NA      NA      NA      NA     
x.T01F00498689 " 2"    NA      NA      NA      " 1"    " 2"    " 2"    " 1"    " 2"   

The ultimate goal is (after proper renaming of the columns) to do things like

plot(CAA-114BA[CAA-114BA >0 & CAA-114BB > 0], CAA-114BB[CAA-114BA >0 & CAA-114BB > 0])

(this combination will appear if I reshape the whole data frame, which has 200,000 rows.)

and then proper statistical tests (which I still have to learn / remember from
12 years ago).

once again, thank you, and please warn me if I am doing something stupid with
this transposition of the reshaped table.

Best regards,

-- 
Charles




More information about the R-help mailing list