[R] data frame pointers?
David Winsemius
dwinsemius at comcast.net
Thu Oct 24 02:39:16 CEST 2013
On Oct 23, 2013, at 5:24 PM, David Winsemius wrote:
>
> On Oct 23, 2013, at 4:36 PM, Jon BR wrote:
>
>> Hello,
>> I've been running several programs in the unix shell, and it's time to
>> combine results from several different pipelines. I've been writing shell
>> scripts with heavy use of awk and grep to make big text files, but I'm
>> thinking it would be better to have all my data in one big structure in R
>> so that I can query whatever attributes I like, and print several
>> corresponding tables to separate files.
>>
>> I haven't used R in years, so I was hoping somebody might be able to
>> suggest a solution or combinatin of functions that could help me get
>> oriented..
>>
>> Right now, I can import my data into a data frame that looks like this:
>>
>> df <-
>> data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))
>>> df
>> case gene issue
>> 1 case_1 gene1 nsyn
>> 2 case_1 gene1 amp
>> 3 case_2 gene1 del
>> 4 case_3 gene2 UTR
>>
>>
>> I'd like to cook up some combination of functions/scripting that can
>> convert a table like df to produce a list or a data frame/ matrix that
>> looks like df2:
>>
>>> df2
>> case_1 case_2 case_3
>> gene1 nsyn,amp del 0
>> gene2 0 0 UTR
>>
>> I can build df2 manually, like this:
>> df2
>> <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR"))
>> rownames(df2)<-c("gene1","gene2")
>
> Factors will be a hassle:
>
> df <-
> data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE)
Note also that stringsAsFactors can be set globally with options as well as during input functions with any of hte cousins of read.table.
> df
>
> with( df, matrix( tapply(issue, list(gene, case), list) ,
> nrow=length(unique(gene)),ncol=length(unique(case)) )
> )
>
> [,1] [,2] [,3]
> [1,] Character,2 "del" NA
> [2,] NA NA "UTR"
>
>> dmat[1,1]
> [[1]]
> [1] "nsyn" "amp"
>
>> as.data.frame(dmat)
> V1 V2 V3
> 1 nsyn, amp del NA
> 2 NA NA UTR
>
It's possible that coming back to R after many years you are not familiar with data.table. It's particularly well suited for large text files. It's syntax with argumets to "[" is quite different.
> dt <- data.table(df)
# To make a list in each category you would need to supply a "doubly `list`-ed" arguemtn to "j".
> dt[ , list(list(issue)), by=c("gene", 'case') ]
gene case V1
1: gene1 case_1 nsyn,amp
2: gene1 case_2 del
3: gene2 case_3 UTR
> dt[ , list(issue), by=c("gene", 'case') ]
gene case issue
1: gene1 case_1 nsyn
2: gene1 case_1 amp
3: gene1 case_2 del
4: gene2 case_3 UTR
>
>>
>> but obviously do not want to do this by hand; I want R to generate df2 from
>> df.
>>
>> Any pointers/ideas would be most welcome!
>>
>> Thanks,
>> Jonathan
>>
>> [[alternative HTML version deleted]]
>
> R is a plain text mailing list. Old school, admittedly, but much better for coding questions. Surely an awk user can appreciate the wisdom of that request?
>
> --
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list