[R] (no subject)
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Mon Sep 16 11:35:18 CEST 2024
Às 08:28 de 16/09/2024, Francesca escreveu:
> Dear Contributors,
> I hope someone has found a similar issue.
>
> I have this data set,
>
>
>
> cp1
> cp2
> role
> groupid
> 1
> 10
> 13
> 4
> 5
> 2
> 5
> 10
> 3
> 1
> 3
> 7
> 7
> 4
> 6
> 4
> 10
> 4
> 2
> 7
> 5
> 5
> 8
> 3
> 2
> 6
> 8
> 7
> 4
> 4
> 7
> 8
> 8
> 4
> 7
> 8
> 10
> 15
> 3
> 3
> 9
> 15
> 10
> 2
> 2
> 10
> 5
> 5
> 2
> 4
> 11
> 20
> 20
> 2
> 5
> 12
> 9
> 11
> 3
> 6
> 13
> 10
> 13
> 4
> 3
> 14
> 12
> 6
> 4
> 2
> 15
> 7
> 4
> 4
> 1
> 16
> 10
> 0
> 3
> 7
> 17
> 20
> 15
> 3
> 8
> 18
> 10
> 7
> 3
> 4
> 19
> 8
> 13
> 3
> 5
> 20
> 10
> 9
> 2
> 6
>
>
>
> I need to to average of groups, using the values of column groupid, and
> create a twin dataset in which the mean of the group is replaced instead of
> individual values.
> So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> replace in the new dataframe, but in the same positions, instead of 12 and
> 18, the values of the corresponding mean.
> I found this solution, where db10_means is the output dataset, db10 is my
> initial data.
>
> db10_means<-db10 %>%
> group_by(groupid) %>%
> mutate(across(starts_with("cp"), list(mean = mean)))
>
> It works perfectly, except that for NA values, where it replaces to all
> group members the NA, while in some cases, the group is made of some NA and
> some values.
> So, when I have a group of two values and one NA, I would like that for
> those with a value, the mean is replaced, for those with NA, the NA is
> replaced.
> Here the mean function has not the na.rm=T option associated, but it
> appears that this solution cannot be implemented in this case. I am not
> even sure that this would be enough to solve my problem.
> Thanks for any help provided.
>
Hello,
Your data is a mess, please don't post html, this is plain text only
list. Anyway, I managed to create a data frame by copying the data to a
file named "rhelp.txt" and then running
db10 <- scan(file = "rhelp.txt", what = character())
header <- db10[1:4]
db10 <- db10[-(1:4)] |> as.numeric()
db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
as.data.frame() |>
setNames(header)
str(db10)
#> 'data.frame': 25 obs. of 4 variables:
#> $ cp1 : num 1 5 3 7 10 5 2 4 8 10 ...
#> $ cp2 : num 10 2 1 4 4 5 6 4 4 15 ...
#> $ role : num 13 5 3 6 2 8 8 7 7 3 ...
#> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ...
And here is the data in dput format.
db10 <-
structure(list(
cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
class = "data.frame", row.names = c(NA, -25L))
As for the problem, I am not sure if you want summarise instead of
mutate but here is a summarise solution.
library(dplyr)
db10 %>%
group_by(groupid) %>%
summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))
# same result, summarise's new argument .by avoids the need to group_by
db10 %>%
summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by =
groupid)
Can you post the expected output too?
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list