[R] Average 2 Columns when possible, or return available value
Joshua Wiley
jwiley.psych at gmail.com
Sat Jun 26 02:36:15 CEST 2010
On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys <jorismeys at gmail.com> wrote:
> Just want to add that if you want to clean out the NA rows in a matrix
> or data frame, take a look at ?complete.cases. Can be handy to use
> with big datasets. I got curious, so I just ran the codes given here
> on a big dataset, before and after removing NA rows. I have to be
> honest, this is surely an illustration of the power of rowMeans. I'm
> amazed myself.
I was too...the documentation (?rowMeans) wasn't joking:
"These functions are equivalent to use of 'apply' with 'FUN = mean' or
'FUN = sum' with appropriate margins, but are a lot faster."
>
> DF <- data.frame(
> A=rep(DF$A,10000),
> B=rep(DF$B,10000)
> )
>
>> system.time(apply(DF,1,mean,na.rm=TRUE))
> user system elapsed
> 13.26 0.06 13.46
>
>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
> user system elapsed
> 0.03 0.00 0.03
>
>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
> + na.rm=TRUE)[,-1]))
> + )
>
> Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape
>
>> DF <- DF[complete.cases(DF),]
>
>> system.time(apply(DF,1,mean,na.rm=TRUE))
> user system elapsed
> 0.39 0.00 0.39
>
>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
> user system elapsed
> 0.01 0.00 0.02
>
>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
> + na.rm=TRUE)[,-1]))
> + )
> user system elapsed
> 10.01 0.07 13.40
>
> Cheers
> Joris
>
>
> On Sat, Jun 26, 2010 at 1:08 AM, emorway <emorway at engr.colostate.edu> wrote:
>>
>> Forum,
>>
>> Using the following data:
>>
>> DF<-read.table(textConnection("A B
>> 22.60 NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> 102.00 NA
>> 19.20 NA
>> 19.20 NA
>> NA NA
>> NA NA
>> NA NA
>> 11.80 NA
>> 7.62 NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> NA NA
>> 75.00 NA
>> NA NA
>> 18.30 18.2
>> NA NA
>> NA NA
>> 8.44 NA
>> 18.00 NA
>> NA NA
>> 12.90 NA"),header=T)
>> closeAllConnections()
>>
>> The second column is a duplicate reading of the first column, and when two
>> values are available, I would like to average column 1 and 2 (example code
>> below). But if there is only one reading, I would like to retain it, but I
>> haven't found a good way to exclude NA's using the following code:
>>
>> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))
>>
>> Currently, row 24 is the only row with a returned value. I'd like the
>> result to return column "A" if it is the only available value, and average
>> where possible. Of course, if both columns are NA, NA is the only possible
>> result.
>>
>> The result I'm after would look like this (row 24 is an avg):
>>
>> 22.60
>> NA
>> NA
>> NA
>> NA
>> NA
>> NA
>> NA
>> 102.00
>> 19.20
>> 19.20
>> NA
>> NA
>> NA
>> 11.80
>> 7.62
>> NA
>> NA
>> NA
>> NA
>> NA
>> 75.00
>> NA
>> 18.25
>> NA
>> NA
>> 8.44
>> 18.00
>> NA
>> 12.90
>>
>> This is a small example from a much larger data frame, so if you're
>> wondering what the deal is with list(), that will come into play for the
>> larger problem I'm trying to solve.
>>
>> Respectfully,
>> Eric
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list