[R] data frame pointers?
David Winsemius
dwinsemius at comcast.net
Thu Oct 24 02:24:19 CEST 2013
On Oct 23, 2013, at 4:36 PM, Jon BR wrote:
> Hello,
> I've been running several programs in the unix shell, and it's time to
> combine results from several different pipelines. I've been writing shell
> scripts with heavy use of awk and grep to make big text files, but I'm
> thinking it would be better to have all my data in one big structure in R
> so that I can query whatever attributes I like, and print several
> corresponding tables to separate files.
>
> I haven't used R in years, so I was hoping somebody might be able to
> suggest a solution or combinatin of functions that could help me get
> oriented..
>
> Right now, I can import my data into a data frame that looks like this:
>
> df <-
> data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))
>> df
> case gene issue
> 1 case_1 gene1 nsyn
> 2 case_1 gene1 amp
> 3 case_2 gene1 del
> 4 case_3 gene2 UTR
>
>
> I'd like to cook up some combination of functions/scripting that can
> convert a table like df to produce a list or a data frame/ matrix that
> looks like df2:
>
>> df2
> case_1 case_2 case_3
> gene1 nsyn,amp del 0
> gene2 0 0 UTR
>
> I can build df2 manually, like this:
> df2
> <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR"))
> rownames(df2)<-c("gene1","gene2")
Factors will be a hassle:
df <-
data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE)
df
with( df, matrix( tapply(issue, list(gene, case), list) ,
nrow=length(unique(gene)),ncol=length(unique(case)) )
)
[,1] [,2] [,3]
[1,] Character,2 "del" NA
[2,] NA NA "UTR"
> dmat[1,1]
[[1]]
[1] "nsyn" "amp"
> as.data.frame(dmat)
V1 V2 V3
1 nsyn, amp del NA
2 NA NA UTR
>
> but obviously do not want to do this by hand; I want R to generate df2 from
> df.
>
> Any pointers/ideas would be most welcome!
>
> Thanks,
> Jonathan
>
> [[alternative HTML version deleted]]
R is a plain text mailing list. Old school, admittedly, but much better for coding questions. Surely an awk user can appreciate the wisdom of that request?
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list