[R] Computing means of multiple variables based on a condition
Thierry Onkelinx
thierry.onkelinx at inbo.be
Thu May 26 15:55:30 CEST 2016
Another option would be to convert the data into a long format and add
columns for each condition.
library(dplyr)
library(tidyr)
DF %>%
gather(key = "key", value = "value", -a, -d) %>%
mutate(
"d>=2" = ifelse(d >= 2, value, NA),
"d>=4" = ifelse(d >= 4, value, NA),
"d>=6" = ifelse(d >= 6, value, NA)
) %>%
select(-d, -value) %>%
gather(key = "condition", value = "value", -a, -key, na.rm = TRUE) %>%
group_by(a, key, condition) %>%
summarise(mean = mean(value)) %>%
spread(key = key, value = mean) %>%
arrange(condition, a)
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2016-05-26 8:34 GMT+02:00 Jeff Newmiller <jdnewmil op dcn.davis.ca.us>:
> Thank you for including some sample data, but I have to ask that you
> please invest some time in learning how to edit your code in a text editor
> and to post in plain text. The quote marks in your example were "curly",
> which R does not understand. There are other ways in which HTML email leads
> to corruption on this mailing list as well, so you will save everyone
> numerous headaches by investing this time sooner rather than later.
>
> The type of operation you are looking for is referred to as an "outer
> join" in SQL nomenclature, and it is intrinsically slow because the only
> way to accomplish it is computationally equivalent to a for loop that
> successively applies each minimum "d" value to your whole data set.
>
> Having said that, you can accomplish this in the "dplyr" syntax instead of
> using a for loop, if that makes you happy, but it is not really any
> "better" than a for loop (and some people might consider it misleading to
> drape a for loop in such fancy syntax):
>
> DF <- data.frame( a = c( "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" )
> , b = c( 15, 35, 20, 99, 75, 64, 33, 78, 45, 20 )
> , c = c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179 )
> , d = c( 1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9,
> 9.3 )
> , stringsAsFactors = FALSE
> )
> passes <- data.frame( dmin = c( 2, 4, 6 ) )
>
> library(dplyr)
>
> DF2 <- ( passes
> %>% rowwise
> %>% do({ # run once for each row in "passes"
> dmin <- .$dmin # dot here refers to row of
> # "passes" data frame
> ( DF
> %>% filter( d >= dmin )
> %>% group_by( a )
> %>% summarise( meanb = mean( b )
> , meanc = mean( c )
> )
> %>% mutate( condition = paste0( "d>=", dmin ) )
> )
> })
> %>% select( a, condition, meanb, meanc )
> %>% as.data.frame
> )
>
>
> On Wed, 25 May 2016, KMNanus wrote:
>
> These will be overlapping subgroups from the same data frame. For
>> example, d<=2 will have length=9, d<=4 will have length=7, etc.
>>
>>
>> Ken
>> kmnanus op gmail.com
>> 914-450-0816 (tel)
>> 347-730-4813 (fax)
>>
>>
>>
>> On May 25, 2016, at 9:06 PM, William Dunlap <wdunlap op tibco.com> wrote:
>>>
>>> Just to be clear, do you really want your 'condition' groups to be be
>>> subsets
>>> of one another? Most (all?) of the *ply functions assume you want
>>> non-overlapping groups so they do a split-summarize-combine sequence.
>>> You would have to replace the split part of that.
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com <http://tibco.com/>
>>> On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus op gmail.com <mailto:
>>> kmnanus op gmail.com>> wrote:
>>> I have a large dataset, a sample of which is:
>>>
>>> a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?)
>>> b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20)
>>> c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179)
>>> d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3)
>>>
>>> df <- data.frame(a,b,c,d)
>>>
>>> I?m trying to construct a data frame that shows the means of c & b based
>>> on the condition of d and grouped by a.
>>>
>>> I want to create the data frame below, then use ggplot2 to create a line
>>> plot of b at various conditions of d.
>>>
>>> I can compute the grouped means (d>=2, d>=4, etc.) one at a time using
>>> dplyr but haven?t figured out how to put them all together or put them in
>>> one data frame.
>>>
>>> I?d rather not use a loop and am relatively new to R. Is there a way i
>>> can use tapply and set it to the conditions above so that I can create the
>>> df below?
>>>
>>>
>>> condition mean(b) mean(c)
>>> A d>=2 ____ _____
>>> B d>=2 ____ _____
>>> A d>=4 ____ _____
>>> B d>=4 ____ _____
>>> A d>=6 ____ _____
>>> B d>=6 ____ _____
>>>
>>>
>>>
>>> Ken
>>> kmnanus op gmail.com <mailto:kmnanus op gmail.com>
>>> 914-450-0816 <tel:914-450-0816> (tel)
>>> 347-730-4813 <tel:347-730-4813> (fax)
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help op r-project.org <mailto:R-help op r-project.org> mailing list -- To
>>> UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help <
>>> https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html <
>>> http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ______________________________________________
>> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil op dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list