[R] (no subject)
Francesca
|r@nce@c@@p@ncotto @end|ng |rom gm@||@com
Tue Sep 17 08:27:26 CEST 2024
Sorry, my typing was corrected by the computer.
When I have a NA, there should be a missing value.
So, if a group has 2 values and a NA, the two that have values, should be
replaced by the mean of the two,
the third should be NA.
The NA is the participant that dropped out.
On Tue, 17 Sept 2024 at 02:27, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> Hmmm... typos and thinkos ?
>
> Maybe:
> mean_narm<- function(x) {
> m <- mean(x, na.rm = T)
> if (is.nan (m)) NA else m
> }
>
> -- Bert
>
> On Mon, Sep 16, 2024 at 4:40 PM CALUM POLWART <polc1410 using gmail.com> wrote:
> >
> > Rui's solution is good.
> >
> > Bert's suggestion is also good!
> >
> > For Berts suggestion you'd make the list bit
> >
> > list(mean = mean_narm)
> >
> > But prior to that define a function:
> >
> > mean_narm<- function(x) {
> >
> > m <- mean(x, na.rm = T)
> >
> > if (!is.Nan (m)) {
> > m <- NA
> > }
> >
> > return (m)
> > }
> >
> > Would do what you suggested in your reply to Bert.
> >
> > On Mon, 16 Sep 2024, 19:48 Rui Barradas, <ruipbarradas using sapo.pt> wrote:
> >
> > > Às 15:23 de 16/09/2024, Francesca escreveu:
> > > > Sorry for posting a non understandable code. In my screen the dataset
> > > > looked correctly.
> > > >
> > > >
> > > > I recreated my dataset, folllowing your example:
> > > >
> > > > test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5,
> NA, 17,
> > > > 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA),
> > > > c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 ,
> 5,
> > > 19,
> > > > NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA),
> > > > c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4,
> 4, 4,
> > > 2,
> > > > 2, 3, 2, 3, 3, 2, 2 ,4),
> > > > c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4,
> 7,
> > > 5,
> > > > 8, 5, 1, 2, 4, 7, 6, 6)))
> > > > colnames(test) <-c("cp1","cp2","role","groupid")
> > > >
> > > > What I have done so far is the following, that works:
> > > > test %>%
> > > > group_by(groupid) %>%
> > > > mutate(across(starts_with("cp"), list(mean = mean)))
> > > >
> > > > But the problem is with NA: everytime the mean encounters a NA, it
> > > creates
> > > > NA for all group members.
> > > > I need the software to calculate the mean ignoring NA. So when the
> group
> > > is
> > > > made of three people, mean of the three.
> > > > If the group is two values and an NA, calculate the mean of two.
> > > >
> > > > My code works , creates a mean at each position for three subjects,
> > > > replacing instead of the value of the single, the group mean.
> > > > But when NA appears, all the group gets NA.
> > > >
> > > > Perhaps there is a different way to obtain the same result.
> > > >
> > > >
> > > >
> > > > On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas using sapo.pt>
> > > wrote:
> > > >
> > > >> Às 08:28 de 16/09/2024, Francesca escreveu:
> > > >>> Dear Contributors,
> > > >>> I hope someone has found a similar issue.
> > > >>>
> > > >>> I have this data set,
> > > >>>
> > > >>>
> > > >>>
> > > >>> cp1
> > > >>> cp2
> > > >>> role
> > > >>> groupid
> > > >>> 1
> > > >>> 10
> > > >>> 13
> > > >>> 4
> > > >>> 5
> > > >>> 2
> > > >>> 5
> > > >>> 10
> > > >>> 3
> > > >>> 1
> > > >>> 3
> > > >>> 7
> > > >>> 7
> > > >>> 4
> > > >>> 6
> > > >>> 4
> > > >>> 10
> > > >>> 4
> > > >>> 2
> > > >>> 7
> > > >>> 5
> > > >>> 5
> > > >>> 8
> > > >>> 3
> > > >>> 2
> > > >>> 6
> > > >>> 8
> > > >>> 7
> > > >>> 4
> > > >>> 4
> > > >>> 7
> > > >>> 8
> > > >>> 8
> > > >>> 4
> > > >>> 7
> > > >>> 8
> > > >>> 10
> > > >>> 15
> > > >>> 3
> > > >>> 3
> > > >>> 9
> > > >>> 15
> > > >>> 10
> > > >>> 2
> > > >>> 2
> > > >>> 10
> > > >>> 5
> > > >>> 5
> > > >>> 2
> > > >>> 4
> > > >>> 11
> > > >>> 20
> > > >>> 20
> > > >>> 2
> > > >>> 5
> > > >>> 12
> > > >>> 9
> > > >>> 11
> > > >>> 3
> > > >>> 6
> > > >>> 13
> > > >>> 10
> > > >>> 13
> > > >>> 4
> > > >>> 3
> > > >>> 14
> > > >>> 12
> > > >>> 6
> > > >>> 4
> > > >>> 2
> > > >>> 15
> > > >>> 7
> > > >>> 4
> > > >>> 4
> > > >>> 1
> > > >>> 16
> > > >>> 10
> > > >>> 0
> > > >>> 3
> > > >>> 7
> > > >>> 17
> > > >>> 20
> > > >>> 15
> > > >>> 3
> > > >>> 8
> > > >>> 18
> > > >>> 10
> > > >>> 7
> > > >>> 3
> > > >>> 4
> > > >>> 19
> > > >>> 8
> > > >>> 13
> > > >>> 3
> > > >>> 5
> > > >>> 20
> > > >>> 10
> > > >>> 9
> > > >>> 2
> > > >>> 6
> > > >>>
> > > >>>
> > > >>>
> > > >>> I need to to average of groups, using the values of column
> groupid, and
> > > >>> create a twin dataset in which the mean of the group is replaced
> > > instead
> > > >> of
> > > >>> individual values.
> > > >>> So for example, groupid 3, I calculate the mean (12+18)/2 and then
> I
> > > >>> replace in the new dataframe, but in the same positions, instead
> of 12
> > > >> and
> > > >>> 18, the values of the corresponding mean.
> > > >>> I found this solution, where db10_means is the output dataset,
> db10 is
> > > my
> > > >>> initial data.
> > > >>>
> > > >>> db10_means<-db10 %>%
> > > >>> group_by(groupid) %>%
> > > >>> mutate(across(starts_with("cp"), list(mean = mean)))
> > > >>>
> > > >>> It works perfectly, except that for NA values, where it replaces
> to all
> > > >>> group members the NA, while in some cases, the group is made of
> some NA
> > > >> and
> > > >>> some values.
> > > >>> So, when I have a group of two values and one NA, I would like
> that for
> > > >>> those with a value, the mean is replaced, for those with NA, the
> NA is
> > > >>> replaced.
> > > >>> Here the mean function has not the na.rm=T option associated, but
> it
> > > >>> appears that this solution cannot be implemented in this case. I
> am not
> > > >>> even sure that this would be enough to solve my problem.
> > > >>> Thanks for any help provided.
> > > >>>
> > > >> Hello,
> > > >>
> > > >> Your data is a mess, please don't post html, this is plain text only
> > > >> list. Anyway, I managed to create a data frame by copying the data
> to a
> > > >> file named "rhelp.txt" and then running
> > > >>
> > > >>
> > > >>
> > > >> db10 <- scan(file = "rhelp.txt", what = character())
> > > >> header <- db10[1:4]
> > > >> db10 <- db10[-(1:4)] |> as.numeric()
> > > >> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> > > >> as.data.frame() |>
> > > >> setNames(header)
> > > >>
> > > >> str(db10)
> > > >> #> 'data.frame': 25 obs. of 4 variables:
> > > >> #> $ cp1 : num 1 5 3 7 10 5 2 4 8 10 ...
> > > >> #> $ cp2 : num 10 2 1 4 4 5 6 4 4 15 ...
> > > >> #> $ role : num 13 5 3 6 2 8 8 7 7 3 ...
> > > >> #> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ...
> > > >>
> > > >>
> > > >> And here is the data in dput format.
> > > >>
> > > >>
> > > >>
> > > >> db10 <-
> > > >> structure(list(
> > > >> cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> > > >> 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> > > >> cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> > > >> 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> > > >> role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> > > >> 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> > > >> groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> > > >> 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> > > >> class = "data.frame", row.names = c(NA, -25L))
> > > >>
> > > >>
> > > >>
> > > >> As for the problem, I am not sure if you want summarise instead of
> > > >> mutate but here is a summarise solution.
> > > >>
> > > >>
> > > >>
> > > >> library(dplyr)
> > > >>
> > > >> db10 %>%
> > > >> group_by(groupid) %>%
> > > >> summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))
> > > >>
> > > >> # same result, summarise's new argument .by avoids the need to
> group_by
> > > >> db10 %>%
> > > >> summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)),
> .by =
> > > >> groupid)
> > > >>
> > > >>
> > > >>
> > > >> Can you post the expected output too?
> > > >>
> > > >> Hope this helps,
> > > >>
> > > >> Rui Barradas
> > > >>
> > > >>
> > > >> --
> > > >> Este e-mail foi analisado pelo software antivírus AVG para
> verificar a
> > > >> presença de vírus.
> > > >> www.avg.com
> > > >>
> > > >
> > > >
> > > Hello,
> > >
> > > Something like this?
> > >
> > >
> > > test <-
> > > structure(list(
> > > cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> > > 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> > > cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> > > 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> > > role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> > > 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> > > groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> > > 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> > > class = "data.frame", row.names = c(NA, -25L))
> > >
> > > library(dplyr)
> > >
> > > test %>%
> > > group_by(groupid) %>%
> > > mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm =
> TRUE))))
> > > #> # A tibble: 25 × 6
> > > #> # Groups: groupid [11]
> > > #> cp1 cp2 role groupid cp1_mean cp2_mean
> > > #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> > > #> 1 1 10 13 4 7 8
> > > #> 2 5 2 5 10 5 2
> > > #> 3 3 1 3 7 6.17 5.17
> > > #> 4 7 4 6 4 7 8
> > > #> 5 10 4 2 7 6.17 5.17
> > > #> 6 5 5 8 3 10.7 13.3
> > > #> 7 2 6 8 7 6.17 5.17
> > > #> 8 4 4 7 8 5 4
> > > #> 9 8 4 7 8 5 4
> > > #> 10 10 15 3 3 10.7 13.3
> > > #> # ℹ 15 more rows
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > >
> > > --
> > > Este e-mail foi analisado pelo software antivírus AVG para verificar a
> > > presença de vírus.
> > > www.avg.com
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > https://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
--
Francesca
----------------------------------
[[alternative HTML version deleted]]
More information about the R-help
mailing list