[R] Average 2 Columns when possible, or return available value
Joris Meys
jorismeys at gmail.com
Sat Jun 26 02:24:08 CEST 2010
Just want to add that if you want to clean out the NA rows in a matrix
or data frame, take a look at ?complete.cases. Can be handy to use
with big datasets. I got curious, so I just ran the codes given here
on a big dataset, before and after removing NA rows. I have to be
honest, this is surely an illustration of the power of rowMeans. I'm
amazed myself.
DF <- data.frame(
A=rep(DF$A,10000),
B=rep(DF$B,10000)
)
> system.time(apply(DF,1,mean,na.rm=TRUE))
user system elapsed
13.26 0.06 13.46
> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
user system elapsed
0.03 0.00 0.03
> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )
Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape
> DF <- DF[complete.cases(DF),]
> system.time(apply(DF,1,mean,na.rm=TRUE))
user system elapsed
0.39 0.00 0.39
> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
user system elapsed
0.01 0.00 0.02
> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )
user system elapsed
10.01 0.07 13.40
Cheers
Joris
On Sat, Jun 26, 2010 at 1:08 AM, emorway <emorway at engr.colostate.edu> wrote:
>
> Forum,
>
> Using the following data:
>
> DF<-read.table(textConnection("A B
> 22.60 NA
> NA NA
> NA NA
> NA NA
> NA NA
> NA NA
> NA NA
> NA NA
> 102.00 NA
> 19.20 NA
> 19.20 NA
> NA NA
> NA NA
> NA NA
> 11.80 NA
> 7.62 NA
> NA NA
> NA NA
> NA NA
> NA NA
> NA NA
> 75.00 NA
> NA NA
> 18.30 18.2
> NA NA
> NA NA
> 8.44 NA
> 18.00 NA
> NA NA
> 12.90 NA"),header=T)
> closeAllConnections()
>
> The second column is a duplicate reading of the first column, and when two
> values are available, I would like to average column 1 and 2 (example code
> below). But if there is only one reading, I would like to retain it, but I
> haven't found a good way to exclude NA's using the following code:
>
> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))
>
> Currently, row 24 is the only row with a returned value. I'd like the
> result to return column "A" if it is the only available value, and average
> where possible. Of course, if both columns are NA, NA is the only possible
> result.
>
> The result I'm after would look like this (row 24 is an avg):
>
> 22.60
> NA
> NA
> NA
> NA
> NA
> NA
> NA
> 102.00
> 19.20
> 19.20
> NA
> NA
> NA
> 11.80
> 7.62
> NA
> NA
> NA
> NA
> NA
> 75.00
> NA
> 18.25
> NA
> NA
> 8.44
> 18.00
> NA
> 12.90
>
> This is a small example from a much larger data frame, so if you're
> wondering what the deal is with list(), that will come into play for the
> larger problem I'm trying to solve.
>
> Respectfully,
> Eric
> --
> View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
More information about the R-help
mailing list