[BioC] Average based on group
Kevin R. Coombes
kevin.r.coombes at gmail.com
Thu May 12 19:36:24 CEST 2011
It would probably be better to construct a meaningful factor that
reflects the correct interpretation. (I tend to dislike code that
assumes that the order of things is always preserved and no rows got
accidentally omitted....) I assume you really want to relate things
based on their offset from the actual SNP position. So you might want
to compute "min" based on the SNP id grouping factor and compute
"offset" relative to that minimum position. You could then use the
offset as the new grouping factor for the averages you want.
Here is (completely untested and written on the fly) pseudo-code to do this:
startpos <- tapply(df$position, df$snp.id, min)
offset <- df$position - startPos[df$snp.id]
myavg <- tapply(df$score, offset, mean)
Kevin
> Ok I get it now,
> If your data is as shown i.e. sorted, then can you just create a dummy
> variable:
> rep(1:10,n) where n is the number of groups and then use by or tapply?
> So in your example:
> by(df[,4],rep(1:10,2),mean)
>
> cheers,
> Achilleas
>
> On Thu, May 12, 2011 at 12:38 PM, Fabrice Tourre<fabrice.ciup at gmail.com>wrote:
>
>> Thanks for your reply. But it cannot be for my purpose. In fact, there
>> are two snps in the example, rs9971029 and rs9971030.
>>
>> I expect fellow output with the fellow data:
>>
>> 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90
>>
>> You can run this example to get above value
>>
>> -----------------------------R code------------------------------------
>> df<-structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label =
>> "chr10", class = "factor"),
>> snp.id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("rs9971029",
>> "rs9971030"), class = "factor"), position = c(71916552L,
>> 71916553L, 71916554L, 71916555L, 71916556L, 71916557L, 71916558L,
>> 71916559L, 71916560L, 71916561L, 71916726L, 71916727L, 71916728L,
>> 71916729L, 71916730L, 71916731L, 71916732L, 71916733L, 71916734L,
>> 71916735L), score = c(0.1, 0.4, 0.3, 0.9, 1, 2, 4, 0.8, 0.9,
>> 0.8, 0.6, 0.5, 0.4, 0.7, 0, 0, 0.6, 0.8, 0.9, 1)), .Names =
>> c("seqnames",
>> "snp.id", "position", "score"), class = "data.frame", row.names = c(NA,
>> -20L))
>>
>> a<-df[1:10,]
>> b<-df[11:20,]
>> cbind(a,b)->c
>> (c[,4]+c[,8])/2
>> ----------------------------------------------------------------
>>
>> The data is :
>>
>> chr10 rs9971029 71916552 0.1
>> chr10 rs9971029 71916553 0.4
>> chr10 rs9971029 71916554 0.3
>> chr10 rs9971029 71916555 0.9
>> chr10 rs9971029 71916556 1
>> chr10 rs9971029 71916557 2
>> chr10 rs9971029 71916558 4
>> chr10 rs9971029 71916559 0.8
>> chr10 rs9971029 71916560 0.9
>> chr10 rs9971029 71916561 0.8
>> chr10 rs9971030 71916726 0.6
>> chr10 rs9971030 71916727 0.5
>> chr10 rs9971030 71916728 0.4
>> chr10 rs9971030 71916729 0.7
>> chr10 rs9971030 71916730 0
>> chr10 rs9971030 71916731 0
>> chr10 rs9971030 71916732 0.6
>> chr10 rs9971030 71916733 0.8
>> chr10 rs9971030 71916734 0.9
>> chr10 rs9971030 71916735 1
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list