[R] conversion of data for use within barchart
Deepayan Sarkar
deepayan.sarkar at gmail.com
Wed Jul 2 22:50:29 CEST 2008
On 7/2/08, Karin Lagesen <karinlag at studmed.uio.no> wrote:
>
>
> I have a data matrix like this:
>
>
> > data[1:10,]
> aaname grp cluster count
> 1 Ala All Singleton 432
> 2 Arg All Singleton 1239
> 3 Asn All Singleton 396
> 4 Asp All Singleton 152
> 5 Cys All Singleton 206
> 6 Gln All Singleton 370
> 7 Glu All Singleton 211
> 8 Gly All Singleton 594
> 9 His All Singleton 213
> 10 Ile All Singleton 44
>
> where the cluster column has three levels.
>
> > levels(data$cluster)
> [1] "Array" "Singleton" "rRNA"
> >
>
> Now, I would like to plot this like this:
>
> barchart(aaname~count|grp, group = cluster, data = data, stack = TRUE)
>
> I am thus using the cluster as the grouping.
>
> I would like to plot the relative abundance within each grouping, such
> that the max level in my plot always is one (or 100). This would for
> instance mean for the Ala in the All grp that the Singleton cluster
> consitute lets say 40% of the Ala in the All grp, wheras the Singleton
> and rRNA makes up 20% each. In this case I would get in my plot a
> Singleton stretching to 40%, whereas the other two would be 20% each,
> all in all making 100%.
>
> I am uncertain of whether I am managing to describe what I want, so I
> hope somebody understands what I want!
So you basically need to compute the sum(count) within clusters, and
divide by those counts. Consider using ave(). For example:
> foo <- data.frame(g = gl(3, 3), count = rpois(9, lambda=20))
> foo
g count
1 1 14
2 1 16
3 1 20
4 2 21
5 2 24
6 2 16
7 3 15
8 3 24
9 3 12
> with(foo, ave(count, g, FUN = sum))
[1] 50 50 50 61 61 61 51 51 51
> foo$gsum <- with(foo, ave(count, g, FUN = sum))
> foo
g count gsum
1 1 14 50
2 1 16 50
3 1 20 50
4 2 21 61
5 2 24 61
6 2 16 61
7 3 15 51
8 3 24 51
9 3 12 51
-Deepayan
More information about the R-help
mailing list