Wed Jul 18 13:27:34 CEST 2012

Helo,

All problems should be easy.

gene variable value gender line rep
1 CG10000 X208.F1.30456 4.758010 Female 208 1
2 CG10000 X365.F2.30478 4.915395 Female 365 2
3 CG10000 X799.F2.30509 4.641636 Female 799 2
4 CG10000 X306.M2.32650 4.550676 Male 306 2
5 CG10000 X712.M2.30830 4.633811 Male 712 2
6 CG10000 X732.M2.30504 4.857564 Male 732 2
7 CG10000 X707.F1.31120 5.104165 Female 707 1
8 CG10000 X514.F2.30493 4.730814 Female 514 2

# See what we have
str(d)

# or put function(x) ...etc... in the aggregate
f <- function(x) c(mean=mean(x), sd=sd(x))
aggregate(value ~ gene + gender, data = d, f)

Hope this helps,

Em 18-07-2012 10:54, robgriffin247 escreveu:
> Hi
> I think/hope there will be a simple solution to this but google-ing has
> provided no answers (probably not using the right words)
>
> I have a long data frame of >2 000 000 rows, and 6 columns. Across this
> there are 24 000 combinations of gene in a column (n=12000) and gender in a
> column (n=2... obviously). I want to create 2 new columns in the data frame
> that on each row gives, in one column the mean value (of gene expression, in
> the column called "value") for that row's gene&gender combination, and in
> the other column the standard deviation for the gene&gender combination.
>
> Any suggestions?
>
> Rob
>
> Example of the top of the data frame:
>
>
