[R] data frame pointers?

David Winsemius dwinsemius at comcast.net
Thu Oct 24 02:24:19 CEST 2013


On Oct 23, 2013, at 4:36 PM, Jon BR wrote:

> Hello,
>    I've been running several programs in the unix shell, and it's time to
> combine results from several different pipelines.  I've been writing shell
> scripts with heavy use of awk and grep to make big text files, but I'm
> thinking it would be better to have all my data in one big structure in R
> so that I can query whatever attributes I like, and print several
> corresponding tables to separate files.
> 
> I haven't used R in years, so I was hoping somebody might be able to
> suggest a solution or combinatin of functions that could help me get
> oriented..
> 
> Right now, I can import my data into a data frame that looks like this:
> 
> df <-
> data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))
>> df
>    case  gene issue
> 1 case_1 gene1  nsyn
> 2 case_1 gene1   amp
> 3 case_2 gene1   del
> 4 case_3 gene2   UTR
> 
> 
> I'd like to cook up some combination of functions/scripting that can
> convert a table like df to produce a list or a data frame/ matrix that
> looks like df2:
> 
>> df2
>        case_1 case_2 case_3
> gene1 nsyn,amp    del      0
> gene2        0      0    UTR
> 
> I can build df2 manually, like this:
> df2
> <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR"))
> rownames(df2)<-c("gene1","gene2")

Factors will be a hassle:

 df <-
data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE)
df

with( df, matrix( tapply(issue, list(gene, case), list) ,
                   nrow=length(unique(gene)),ncol=length(unique(case)) )
      )

     [,1]        [,2]  [,3] 
[1,] Character,2 "del" NA   
[2,] NA          NA    "UTR"

> dmat[1,1]
[[1]]
[1] "nsyn" "amp" 

> as.data.frame(dmat)
         V1  V2  V3
1 nsyn, amp del  NA
2        NA  NA UTR


> 
> but obviously do not want to do this by hand; I want R to generate df2 from
> df.
> 
> Any pointers/ideas would be most welcome!
> 
> Thanks,
> Jonathan
> 
> 	[[alternative HTML version deleted]]

R is a plain text mailing list. Old school, admittedly,  but much better for coding questions. Surely an awk user can appreciate the wisdom of that request?

-- 
David Winsemius
Alameda, CA, USA



More information about the R-help mailing list