[R] frequency, count rows, data for heat map
Jan van der Laan
djvanderlaan at gmail.com
Thu Aug 26 09:02:48 CEST 2010
Please, reply to the r-help and not only to me personally. That way
others can can also help, or perhaps benefit from the answers.
You can use strplit to remove the last part of the strings. strplit
returns a list of character vectors from which you (if I understand
you correctly) only want to select the first element. I use laply from
the plyr library for this, although there are probably also other ways
of doing this.
library(plyr)
dat$V3 <- laply(strsplit(as.character(dat$V1), '_'), function(l) l[1])
After that you can use daply as I showed in my previous post
[daply(dat, V3 ~ V2, nrow)] or use the methods suggested by Dennis
Murphy to build your table.
Regards,
Jan
On Thu, Aug 26, 2010 at 1:41 AM, Trip Sweeney <tripsweeney at gmail.com> wrote:
> Jan,
> Thanks for responding to my post to listeserve about arranging data matrix
> for heat map.
> I am still a beginner, so the below is the code I used for the matrix and
> did not yet learn how to
> input 'data.frame' (which I need to know to use your code). The below code
> works
> and mock.txt file is attached. There is one thing, though. The input in
> column 1 is tricky
> in the mock.txt file. I need it to sum per unique ID based on character
> prior to the "_"
> So, for example the current script call 1079_17891 and 1079_14794 uniques
> when I want
> them to be tallied together since they are both part of same 1079 samples.
> Occasionally
> a sample has three characters before the "_", like 111_463428 etc in
> mock.txt. The substring
> after the "_" is variable length. In the end, it should be one row for 1079,
> one for 111, and one for 5576.
> Can you help me with this modification of the code? Any advice much
> appreciated. Sincerely, Trip
>
> dat<-read.table('mock.txt',sep="\t")
> sumData=matrix(NA,nrow=length(unique(dat[,1])),ncol=length(unique(dat[,2])))
> rownames(sumData)<-unique(dat[,1])
> colnames(sumData)<-unique(dat[,2])
>
> for (i in 1:dim(sumData)[1]){
> for(j in 1:dim(sumData)[2]){
> sumData[i,j]<-sum (dat[,1]==unique(dat[,1])[i] &
> dat[,2]==unique(dat[,2])[j])
> }
> }
>
> write.table(sumData,"SummarizedData.txt",sep="\t",col.names=NA)
>
On Wed, Aug 25, 2010 at 4:53 PM, rtsweeney <tripsweeney at gmail.com> wrote:
>
> Hi all,
> I have read posts of heat map creation but I am one step prior --
> Here is what I am trying to do and wonder if you have any tips?
> We are trying to map sequence reads from tumors to viral genomes.
>
> Example input file :
> 111 abc
> 111 sdf
> 111 xyz
> 1079 abc
> 1079 xyz
> 1079 xyz
> 5576 abc
> 5576 sdf
> 5576 sdf
>
> How may xyz's are there for 1079 and 111? How many abc's, etc?
> How many times did reads from sample (1079) align to virus xyz.
> In some cases there are thousands per virus in a give sample, sometimes one.
> The original file (two columns by tens of thousands of rows; 20 MB) is
> text file (tab delimited).
>
> Output file:
> abc sdf xyz
> 111 1 1 1
> 1079 1 0 2
> 5576 1 2 0
>
> Or, other ways to generate this data so I can then use it for heat map
> creation?
>
> Thanks for any help you may have,
>
> rtsweeney
> palo alto, ca
> --
> View this message in context: http://r.789695.n4.nabble.com/frequency-count-rows-data-for-heat-map-tp2338363p2338363.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
1079_346 281416490|ref|NC_013643.1|
1079_346 281416323|ref|NC_013646.1|
1079_378 9629367|ref|NC_001803.1|
1079_588 30984428|ref|NC_004812.1|
1079_1292 9629367|ref|NC_001803.1|
1079_3956 9629357|ref|NC_001802.1|
1079_4736 9629357|ref|NC_001802.1|
1079_7732 21427641|ref|NC_004015.1|
1079_7855 118197620|ref|NC_008584.1|
1079_8618 32453484|ref|NC_004928.1|
1079_11540 10140926|ref|NC_002531.1|
1079_14794 9629367|ref|NC_001803.1|
1079_15738 109255272|ref|NC_008168.1|
1079_17891 299778956|ref|NC_014260.1|
1079_18414 157781212|ref|NC_009823.1|
1079_18414 157781216|ref|NC_009824.1|
1079_20312 9629367|ref|NC_001803.1|
1079_20497 9629357|ref|NC_001802.1|
1079_26750 9629367|ref|NC_001803.1|
1079_27926 9628113|ref|NC_001659.1|
1079_27926 9628113|ref|NC_001659.1|
1079_28033 84662653|ref|NC_007710.1|
1079_30020 47835019|ref|NC_004333.2|
1079_30371 9629367|ref|NC_001803.1|
1079_35750 50313241|ref|NC_001491.2|
1079_35750 50313241|ref|NC_001491.2|
111_463428 56694721|ref|NC_006560.1|
111_464636 114680053|ref|NC_008349.1|
111_464636 9627742|ref|NC_001623.1|
111_465190 9627186|ref|NC_001539.1|
111_467613 51557483|ref|NC_006151.1|
111_467613 51557483|ref|NC_006151.1|
111_467975 9627742|ref|NC_001623.1|
111_467975 114680053|ref|NC_008349.1|
111_467975 23577820|ref|NC_004323.1|
111_469706 21426072|ref|NC_004003.1|
111_469706 21426072|ref|NC_004003.1|
111_469793 146261990|ref|NC_001826.2|
111_470996 203454602|ref|NC_011273.1|
111_473637 281415946|ref|NC_013650.1|
111_473637 203458877|ref|NC_011269.1|
111_473637 109393216|ref|NC_008207.1|
111_473637 203457352|ref|NC_011272.1|
111_473637 203460520|ref|NC_011270.1|
111_473637 29566511|ref|NC_004687.1|
111_473637 204305660|ref|NC_011271.1|
5576_315871 168804017|ref|NC_010356.1|
5576_316443 9629198|ref|NC_001781.1|
5576_324191 148727082|ref|NC_009541.1|
5576_327936 9629267|ref|NC_001798.1|
5576_327936 9629267|ref|NC_001798.1|
5576_327936 9629267|ref|NC_001798.1|
5576_330546 216905965|ref|NC_011645.1|
5576_333512 57659681|ref|NC_006659.1|
5576_333512 57753428|ref|NC_006634.1|
5576_333512 57659681|ref|NC_006659.1|
5576_353878 20522096|ref|NC_003795.1|
5576_354562 9627186|ref|NC_001539.1|
5576_354577 19718363|ref|NC_003461.1|
5576_358444 48696722|ref|NC_005881.1|
5576_358444 48696722|ref|NC_005881.1|
5576_366975 9629178|ref|NC_001753.1|
5576_368020 239505241|ref|NC_012783.1|
5576_371413 48696722|ref|NC_005881.1|
5576_371413 48696722|ref|NC_005881.1|
5576_375881 48696722|ref|NC_005881.1|
More information about the R-help
mailing list