[R] Re peatedly apply multiple functions to subsets of data.
Claire Jouseau
claire.jouseau at gmail.com
Sun Apr 13 19:47:59 CEST 2008
Dear R-users:
I have a large dataframe with the following format:
>plants
id trt year size num spA spB spZ
1011a 1 1 23.2 3 12 3.2 8
1011a 1 2 17.9 2 10 5.1 2.8
1011a 1 3 12.5 7 12 0 0.5
1011b 2 1 NA NA NA NA NA
1011b 2 2 6 6 4 2 0
1011b 2 3 100.3 5 3 95 2.3
28105a 1 1 9.1 8 0.5 0 8.6
28105a 1 2 16.6 4 2 12 4.6
28105a 1 3 8.7 7 1 0.2 7.5
I am looking for advice on how to select a subset of rows with
non-sequential id numbers, apply a series of functions to the subset
(excluding rows with missing data), and print the output to a new dataframe
containing the output from each unique id. I need to perform the following
calculations on each subset of id numbers:
1) for all columns: mean and standard deviation and variance
2) for columns "spA" to "spZ": sum of the covariance matrix and sum of the
variance of each column
3) for columns "size" and "year": linear regression of form lm(size~year)
Ideally my new dataframes would have the following formats:
>plants.calc
id trt mean.size sd.size mean.num sd.num sum.spcovar sum.spvar
mean.spA sd.spA var.spA
1011a a 17.9 5.4 4.0 2.6 17.12 22.74
11.33 1.15 1.33
>plants.lm
id intercept se.intercept estimate se.estimate adj.Rsq Tvalue Pvalue
N
1011a 28.57 0.06 -5.35 0.03 0.9999 458.09 0.0014 3
I am very new to R and have written the following code from which I can
successfully extract the summed covariance values but not anything else
because I cannot figure out, if possible, how to extract the relevant
columns from a list. Any help you can offer would be greatly appreciated.
Thanks,
Claire.
n <-length(unique(plants$id))
output <-lapply(split(plants,plants$id),head,3)
out <-as.array(output)
sum.spcovar <-NULL
col.mean <-NULL
col.sd <-NULL
col.var <-NULL
sum.spvar <-NULL
for(i in 1:n){
spcovar <-function(x) {colSums(var(x))}
sum.spcovar[i] <- sum(spcovar(out[[i]]))
col.mean[i] <-colMeans(out[[i]])
col.sd[i] <-sd(out[[i]])
col.var[i] <-(sd(out[[i]])^2)
sum.spvar[i] <-sum((sd(out[[i]]))^2)
}
plants.calc <-data.frame(unique(plants$id),
rep(1:2,length(uniqueplants$id)), sum.spcovar,
sum.spvar, col.mean, col.sd, col.var)
--
View this message in context: http://www.nabble.com/Repeatedly-apply-multiple-functions-to-subsets-of-data.-tp16661991p16661991.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list