[R] Re peatedly apply multiple functions to subsets of data.

Claire Jouseau claire.jouseau at gmail.com
Sun Apr 13 19:47:59 CEST 2008


Dear R-users:

I have a large dataframe with the following format:

>plants

id 	trt	year	size	num	spA	spB	spZ
							
1011a	1	1	23.2	3	12	3.2	8
1011a	1	2	17.9	2	10	5.1	2.8
1011a	1	3	12.5	7	12	0	0.5
1011b	2	1	NA	NA	NA	NA	NA
1011b	2	2	6	6	4	2	0
1011b	2	3	100.3	5	3	95	2.3
28105a	1	1	9.1	8	0.5	0	8.6
28105a	1	2	16.6	4	2	12	4.6
28105a	1	3	8.7	7	1	0.2	7.5


I am looking for advice on how to select a subset of rows with
non-sequential id numbers, apply a series of functions to the subset
(excluding rows with missing data), and print the output to a new dataframe
containing the output from each unique id.  I need to perform the following
calculations on each subset of id numbers:

1) for all columns: mean and standard deviation and variance

2) for columns "spA" to "spZ": sum of the covariance matrix and sum of the
variance of each column

3) for columns "size" and "year": linear regression of form lm(size~year)


Ideally my new dataframes would have the following formats:

>plants.calc

id    trt   mean.size  sd.size  mean.num  sd.num  sum.spcovar  sum.spvar 
mean.spA  sd.spA  var.spA 

1011a  a    17.9       5.4      4.0       2.6     17.12        22.74     
11.33     1.15    1.33


>plants.lm

id	intercept   se.intercept   estimate	se.estimate	adj.Rsq	  Tvalue   Pvalue 
N

1011a	28.57	    0.06	   -5.35	0.03		0.9999	  458.09   0.0014  3


I am very new to R and have written the following code from which I can
successfully extract the summed covariance values but not anything else
because I cannot figure out, if possible, how to extract the relevant
columns from a list.  Any help you can offer would be greatly appreciated.

Thanks,
Claire.


n <-length(unique(plants$id))
output <-lapply(split(plants,plants$id),head,3)
out <-as.array(output)

sum.spcovar <-NULL
col.mean <-NULL
col.sd <-NULL
col.var <-NULL
sum.spvar <-NULL

for(i in 1:n){

     spcovar <-function(x) {colSums(var(x))}
     sum.spcovar[i] <- sum(spcovar(out[[i]]))

     col.mean[i] <-colMeans(out[[i]])
     col.sd[i] <-sd(out[[i]])
     col.var[i] <-(sd(out[[i]])^2)
     sum.spvar[i] <-sum((sd(out[[i]]))^2)

  }

plants.calc <-data.frame(unique(plants$id),
rep(1:2,length(uniqueplants$id)), sum.spcovar, 
sum.spvar, col.mean, col.sd, col.var)

-- 
View this message in context: http://www.nabble.com/Repeatedly-apply-multiple-functions-to-subsets-of-data.-tp16661991p16661991.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list