[R] (no subject)

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Sep 17 02:26:58 CEST 2024


Hmmm... typos and thinkos ?

Maybe:
mean_narm<- function(x) {
   m <- mean(x, na.rm = T)
   if (is.nan (m)) NA else m
}

-- Bert

On Mon, Sep 16, 2024 at 4:40 PM CALUM POLWART <polc1410 using gmail.com> wrote:
>
> Rui's solution is good.
>
> Bert's suggestion is also good!
>
> For Berts suggestion you'd make the list bit
>
> list(mean = mean_narm)
>
> But prior to that define a function:
>
> mean_narm<- function(x) {
>
> m <- mean(x, na.rm = T)
>
> if (!is.Nan (m)) {
> m <- NA
> }
>
> return (m)
> }
>
> Would do what you suggested in your reply to Bert.
>
> On Mon, 16 Sep 2024, 19:48 Rui Barradas, <ruipbarradas using sapo.pt> wrote:
>
> > Às 15:23 de 16/09/2024, Francesca escreveu:
> > > Sorry for posting a non understandable code. In my screen the dataset
> > > looked correctly.
> > >
> > >
> > > I recreated my dataset, folllowing your example:
> > >
> > > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA, 17,
> > >   2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
> > >                          c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5,
> > 19,
> > > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
> > >                          c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4,
> > 2,
> > > 2, 3, 2, 3, 3, 2, 2 ,4),
> > >                          c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7,
> > 5,
> > > 8, 5, 1, 2, 4, 7, 6, 6)))
> > > colnames(test)    <-c("cp1","cp2","role","groupid")
> > >
> > > What I have done so far is the following, that works:
> > >   test %>%
> > >    group_by(groupid) %>%
> > >    mutate(across(starts_with("cp"), list(mean = mean)))
> > >
> > > But the problem is with NA: everytime the mean encounters a NA, it
> > creates
> > > NA for all group members.
> > > I need the software to calculate the mean ignoring NA. So when the group
> > is
> > > made of three people, mean of the three.
> > > If the group is two values and an NA, calculate the mean of two.
> > >
> > > My code works , creates a mean at each position for three subjects,
> > > replacing instead of the value of the single, the group mean.
> > > But when NA appears, all the group gets NA.
> > >
> > > Perhaps there is a different way to obtain the same result.
> > >
> > >
> > >
> > > On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas using sapo.pt>
> > wrote:
> > >
> > >> Às 08:28 de 16/09/2024, Francesca escreveu:
> > >>> Dear Contributors,
> > >>> I hope someone has found a similar issue.
> > >>>
> > >>> I have this data set,
> > >>>
> > >>>
> > >>>
> > >>> cp1
> > >>> cp2
> > >>> role
> > >>> groupid
> > >>> 1
> > >>> 10
> > >>> 13
> > >>> 4
> > >>> 5
> > >>> 2
> > >>> 5
> > >>> 10
> > >>> 3
> > >>> 1
> > >>> 3
> > >>> 7
> > >>> 7
> > >>> 4
> > >>> 6
> > >>> 4
> > >>> 10
> > >>> 4
> > >>> 2
> > >>> 7
> > >>> 5
> > >>> 5
> > >>> 8
> > >>> 3
> > >>> 2
> > >>> 6
> > >>> 8
> > >>> 7
> > >>> 4
> > >>> 4
> > >>> 7
> > >>> 8
> > >>> 8
> > >>> 4
> > >>> 7
> > >>> 8
> > >>> 10
> > >>> 15
> > >>> 3
> > >>> 3
> > >>> 9
> > >>> 15
> > >>> 10
> > >>> 2
> > >>> 2
> > >>> 10
> > >>> 5
> > >>> 5
> > >>> 2
> > >>> 4
> > >>> 11
> > >>> 20
> > >>> 20
> > >>> 2
> > >>> 5
> > >>> 12
> > >>> 9
> > >>> 11
> > >>> 3
> > >>> 6
> > >>> 13
> > >>> 10
> > >>> 13
> > >>> 4
> > >>> 3
> > >>> 14
> > >>> 12
> > >>> 6
> > >>> 4
> > >>> 2
> > >>> 15
> > >>> 7
> > >>> 4
> > >>> 4
> > >>> 1
> > >>> 16
> > >>> 10
> > >>> 0
> > >>> 3
> > >>> 7
> > >>> 17
> > >>> 20
> > >>> 15
> > >>> 3
> > >>> 8
> > >>> 18
> > >>> 10
> > >>> 7
> > >>> 3
> > >>> 4
> > >>> 19
> > >>> 8
> > >>> 13
> > >>> 3
> > >>> 5
> > >>> 20
> > >>> 10
> > >>> 9
> > >>> 2
> > >>> 6
> > >>>
> > >>>
> > >>>
> > >>> I need to to average of groups, using the values of column groupid, and
> > >>> create a twin dataset in which the mean of the group is replaced
> > instead
> > >> of
> > >>> individual values.
> > >>> So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> > >>> replace in the new dataframe, but in the same positions, instead of 12
> > >> and
> > >>> 18, the values of the corresponding mean.
> > >>> I found this solution, where db10_means is the output dataset, db10 is
> > my
> > >>> initial data.
> > >>>
> > >>> db10_means<-db10 %>%
> > >>>     group_by(groupid) %>%
> > >>>     mutate(across(starts_with("cp"), list(mean = mean)))
> > >>>
> > >>> It works perfectly, except that for NA values, where it replaces to all
> > >>> group members the NA, while in some cases, the group is made of some NA
> > >> and
> > >>> some values.
> > >>> So, when I have a group of two values and one NA, I would like that for
> > >>> those with a value, the mean is replaced, for those with NA, the NA is
> > >>> replaced.
> > >>> Here the mean function has not the na.rm=T option associated, but it
> > >>> appears that this solution cannot be implemented in this case. I am not
> > >>> even sure that this would be enough to solve my problem.
> > >>> Thanks for any help provided.
> > >>>
> > >> Hello,
> > >>
> > >> Your data is a mess, please don't post html, this is plain text only
> > >> list. Anyway, I managed to create a data frame by copying the data to a
> > >> file named "rhelp.txt" and then running
> > >>
> > >>
> > >>
> > >> db10 <- scan(file = "rhelp.txt", what = character())
> > >> header <- db10[1:4]
> > >> db10 <- db10[-(1:4)] |> as.numeric()
> > >> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> > >>     as.data.frame() |>
> > >>     setNames(header)
> > >>
> > >> str(db10)
> > >> #> 'data.frame':    25 obs. of  4 variables:
> > >> #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> > >> #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> > >> #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> > >> #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
> > >>
> > >>
> > >> And here is the data in dput format.
> > >>
> > >>
> > >>
> > >> db10 <-
> > >>     structure(list(
> > >>       cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> > >>               2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> > >>       cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> > >>               4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> > >>       role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> > >>                11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> > >>       groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> > >>                   20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> > >>       class = "data.frame", row.names = c(NA, -25L))
> > >>
> > >>
> > >>
> > >> As for the problem, I am not sure if you want summarise instead of
> > >> mutate but here is a summarise solution.
> > >>
> > >>
> > >>
> > >> library(dplyr)
> > >>
> > >> db10 %>%
> > >>     group_by(groupid) %>%
> > >>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))
> > >>
> > >> # same result, summarise's new argument .by avoids the need to group_by
> > >> db10 %>%
> > >>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by =
> > >> groupid)
> > >>
> > >>
> > >>
> > >> Can you post the expected output too?
> > >>
> > >> Hope this helps,
> > >>
> > >> Rui Barradas
> > >>
> > >>
> > >> --
> > >> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> > >> presença de vírus.
> > >> www.avg.com
> > >>
> > >
> > >
> > Hello,
> >
> > Something like this?
> >
> >
> > test <-
> >    structure(list(
> >      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> >              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> >      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> >              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> >      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> >               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> >      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> >                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> >      class = "data.frame", row.names = c(NA, -25L))
> >
> > library(dplyr)
> >
> > test %>%
> >    group_by(groupid) %>%
> >    mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm = TRUE))))
> > #> # A tibble: 25 × 6
> > #> # Groups:   groupid [11]
> > #>      cp1   cp2  role groupid cp1_mean cp2_mean
> > #>    <dbl> <dbl> <dbl>   <dbl>    <dbl>    <dbl>
> > #>  1     1    10    13       4     7        8
> > #>  2     5     2     5      10     5        2
> > #>  3     3     1     3       7     6.17     5.17
> > #>  4     7     4     6       4     7        8
> > #>  5    10     4     2       7     6.17     5.17
> > #>  6     5     5     8       3    10.7     13.3
> > #>  7     2     6     8       7     6.17     5.17
> > #>  8     4     4     7       8     5        4
> > #>  9     8     4     7       8     5        4
> > #> 10    10    15     3       3    10.7     13.3
> > #> # ℹ 15 more rows
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > --
> > Este e-mail foi analisado pelo software antivírus AVG para verificar a
> > presença de vírus.
> > www.avg.com
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list