[R] Computing stats on common parts of multiple dataframes
Erik Iverson
iverson at biostat.wisc.edu
Tue Feb 13 21:42:21 CET 2007
Murali -
I've come up with something that might with work, with gratutious use of
the *apply functions. See ?apply, ?lappy, and ?mapply for how this
would work. Basically, just set my.list equal to a list of data.frames
you would like included. I made this to work with matrices first, so
it does use as.matrix() in my function. Also, this could be turned into
a general function so that you could specify a different function other
than "median".
#Make my.list equal to a list of dataframes you want
my.list <- list(df1,df2)
#What's the shortest?
minrow <- min(sapply(my.list,nrow))
#Chop all to the shortest
tmp <- lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),])
#Do the computation, could change median to mean, or a user defined
#function
matrix(apply(mapply("[",lapply(tmp,as.matrix),
MoreArgs=list(1:(minrow*2))), 1, median),
ncol=2)
HTH, whether or not this is any "better" than your for loop solution is
left up to you.
Erik
Murali Menon wrote:
> Folks,
>
> I have three dataframes storing some information about
> two currency pairs, as follows:
>
> R> a
>
> EUR-USD NOK-SEK
> 1.23 1.33
> 1.22 1.43
> 1.26 1.42
> 1.24 1.50
> 1.21 1.36
> 1.26 1.60
> 1.29 1.44
> 1.25 1.36
> 1.27 1.39
> 1.23 1.48
> 1.22 1.26
> 1.24 1.29
> 1.27 1.57
> 1.21 1.55
> 1.23 1.35
> 1.25 1.41
> 1.25 1.30
> 1.23 1.11
> 1.28 1.37
> 1.27 1.23
>
>
>
> R> b
> EUR-USD NOK-SEK
> 1.23 1.22
> 1.21 1.36
> 1.28 1.61
> 1.23 1.34
> 1.21 1.22
>
>
>
> R> d
>
> EUR-USD NOK-SEK
> 1.27 1.39
> 1.23 1.48
> 1.22 1.26
> 1.24 1.29
> 1.27 1.57
> 1.21 1.55
> 1.23 1.35
> 1.25 1.41
> 1.25 1.33
> 1.23 1.11
> 1.28 1.37
> 1.27 1.23
>
> The twist is that these entries correspond to dates where the
> *last* rows in each frame are today's entries, and so on
> backwards in time.
>
> I would like to create a matrix of medians (a median for each row
> and for each currency pair), but only for those rows where all
> dataframes have entries.
>
> My answer in this case should look like:
>
> EUR-USD NOK-SEK
>
> 1.25 1.41
> 1.25 1.33
> 1.23 1.11
> 1.28 1.37
> 1.27 1.23
>
> where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
>
> Notice that the output is of the same dimensions as the smallest dataframe
> (in this case 'b').
>
> I can do it in a clumsy fashion by first obtaining the number
> of rows in the smallest matrix, chopping off the top rows
> of the other matrices to reduce them this size, then doing a
> for-loop across each currency pair, row-wise, to create a
> 3-vector which I then apply median() on.
>
> Surely there's a better way to do this?
>
> Please advise.
>
> Thanks,
>
> Murali Menon
>
> _________________________________________________________________
> Valentine’s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list