[R] Fill NA values in columns with values of another column

Wed Aug 28 17:24:22 CEST 2024

Why not use na.omit() and then go from there? Unless one handles NA differently in different groups there is no point in processing the data by groups to remove NA even if later analysis steps do require group information.

Tim

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
Sent: Wednesday, August 28, 2024 4:19 AM
To: Francesca PANCOTTO <francesca.pancotto using unimore.it>; r-help using r-project.org
Subject: Re: [R] Fill NA values in columns with values of another column

[External Email]

Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu:
> Dear Contributors,
> I have a problem with a database composed of many individuals for many
> periods, for which I need to perform a manipulation of data as follows.
> Here I report the procedure I need to do for the first 32 observations
> of the first period.
>
>
> cbind(VB1d[,1],s1id[,1])
>        [,1] [,2]
>   [1,]    6    8
>   [2,]    9    5
>   [3,]   NA    1
>   [4,]    5    6
>   [5,]   NA    7
>   [6,]   NA    2
>   [7,]    4    4
>   [8,]    2    7
>   [9,]    2    7
> [10,]   NA    3
> [11,]   NA    2
> [12,]   NA    4
> [13,]    5    6
> [14,]    9    5
> [15,]   NA    5
> [16,]   NA    6
> [17,]   10    3
> [18,]    7    2
> [19,]    2    1
> [20,]   NA    7
> [21,]    7    2
> [22,]   NA    8
> [23,]   NA    4
> [24,]   NA    5
> [25,]   NA    6
> [26,]    2    1
> [27,]    4    4
> [28,]    6    8
> [29,]   10    3
> [30,]   NA    3
> [31,]   NA    8
> [32,]   NA    1
>
>
> In column s1id, I have numbers from 1 to 8, which are the id of 8
> groups , randomly mixed in the larger group of 32.
> For each group, I want the value that is reported for only to group
> members, to all the four group members.
>
> For example, value 8 in first row , second column, is group 8. The
> value for group 8 of the variable VB1d is 6. At row 28, again for s1id
> equal to 8, I have 6.
> But in row 22, the value 8 of the second variable, reports a value NA.
> in each group is the same, only two values have the correct number,
> the other two are NA.
> I need that each group, identified by the values of the variable S1id,
> correctly report the number of variable VB1d that is present for just
> two group members.
>
> I hope my explanation is acceptable.
> The task appears complex to me right now, especially because I will
> need to multiply this procedure for x12x14 similar databases.
>
> Anyone has ever encountered a similar problem?
> Thanks in advance for any help provided.
>
> ----------------------------------
>
> Francesca Pancotto
>
> Associate Professor Political Economy
>
> University of Modena, Largo Santa Eufemia, 19, Modena
>
> Office Phone: +39 0522 523264
>
> Web:
> *https://sit/
> es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert%
> 40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6
> 2331e1b84%7C0%7C0%7C638604299508876897%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%
> 7C&sdata=yHdkL%2BmnsHgL1O3nE%2B0r4Wf5nvRgJp66VWJHHiYJVGA%3D&reserved=0
> <https://sit/
> es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert%
> 40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6
> 2331e1b84%7C0%7C0%7C638604299508887226%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%
> 7C&sdata=XsB7jdjGD5S7YKiyPhY5DSR%2F1yhPrTuFxdA5qz3KEBY%3D&reserved=0>*
>
>   ----------------------------------
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
> %7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638604299508890269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=
> BLTZvAFGtdZUoKefcgEtEsrw5pm4UHRUZJCGLXx5QFE%3D&reserved=0
> PLEASE do read the posting guide
> https://www/.
> r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C0
> ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84%7C0
> %7C0%7C638604299508893127%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=q4Mj
> %2BjSL2ZG0%2Fi0%2FrBUR3Z2B%2BbV6eH35to2Rt6kHUZ8%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
Hello,

Here is a solution.
Split the 1st column by the 2nd, keep only the not-NA values and unlist, to have a named vector.
Then put the names and the values together with cbind.

mat <- structure(
   c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L,
     9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L,
     10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L,
     4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L,
     8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L))

res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist() nms <- names(res) res <- cbind(
   VB1d = res,
   s1id = substr(nms, 1, nchar(nms) - 1L) |> as.integer()
)
res
#>    VB1d s1id
#> 11    2    1
#> 12    2    1
#> 21    7    2
#> 22    7    2
#> 31   10    3
#> 32   10    3
#> 41    4    4
#> 42    4    4
#> 51    9    5
#> 52    9    5
#> 61    5    6
#> 62    5    6
#> 71    2    7
#> 72    2    7
#> 81    6    8
#> 82    6    8

Hope this helps,

Rui Barradas

--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
http://www.avg.com/

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.