[R] Fill NA values in columns with values of another column

Wed Aug 28 10:18:42 CEST 2024

Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu:
> Dear Contributors,
> I have a problem with a database composed of many individuals for many
> periods, for which I need to perform a manipulation of data as follows.
> Here I report the procedure I need to do for the first 32 observations of
> the first period.
> 
> 
> cbind(VB1d[,1],s1id[,1])
>        [,1] [,2]
>   [1,]    6    8
>   [2,]    9    5
>   [3,]   NA    1
>   [4,]    5    6
>   [5,]   NA    7
>   [6,]   NA    2
>   [7,]    4    4
>   [8,]    2    7
>   [9,]    2    7
> [10,]   NA    3
> [11,]   NA    2
> [12,]   NA    4
> [13,]    5    6
> [14,]    9    5
> [15,]   NA    5
> [16,]   NA    6
> [17,]   10    3
> [18,]    7    2
> [19,]    2    1
> [20,]   NA    7
> [21,]    7    2
> [22,]   NA    8
> [23,]   NA    4
> [24,]   NA    5
> [25,]   NA    6
> [26,]    2    1
> [27,]    4    4
> [28,]    6    8
> [29,]   10    3
> [30,]   NA    3
> [31,]   NA    8
> [32,]   NA    1
> 
> 
> In column s1id, I have numbers from 1 to 8, which are the id of 8 groups ,
> randomly mixed in the larger group of 32.
> For each group, I want the value that is reported for only to group
> members, to all the four group members.
> 
> For example, value 8 in first row , second column, is group 8. The value
> for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
> 8, I have 6.
> But in row 22, the value 8 of the second variable, reports a value NA.
> in each group is the same, only two values have the correct number, the
> other two are NA.
> I need that each group, identified by the values of the variable S1id,
> correctly report the number of variable VB1d that is present for just two
> group members.
> 
> I hope my explanation is acceptable.
> The task appears complex to me right now, especially because I will need to
> multiply this procedure for x12x14 similar databases.
> 
> Anyone has ever encountered a similar problem?
> Thanks in advance for any help provided.
> 
> ----------------------------------
> 
> Francesca Pancotto
> 
> Associate Professor Political Economy
> 
> University of Modena, Largo Santa Eufemia, 19, Modena
> 
> Office Phone: +39 0522 523264
> 
> Web: *https://sites.google.com/view/francescapancotto/home
> <https://sites.google.com/view/francescapancotto/home>*
> 
>   ----------------------------------
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

Here is a solution.
Split the 1st column by the 2nd, keep only the not-NA values and unlist, 
to have a named vector.
Then put the names and the values together with cbind.

mat <- structure(
   c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L,
     9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L,
     10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L,
     4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L,
     8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L))

res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist()
nms <- names(res)
res <- cbind(
   VB1d = res,
   s1id = substr(nms, 1, nchar(nms) - 1L) |> as.integer()
)
res
#>    VB1d s1id
#> 11    2    1
#> 12    2    1
#> 21    7    2
#> 22    7    2
#> 31   10    3
#> 32   10    3
#> 41    4    4
#> 42    4    4
#> 51    9    5
#> 52    9    5
#> 61    5    6
#> 62    5    6
#> 71    2    7
#> 72    2    7
#> 81    6    8
#> 82    6    8

Hope this helps,

Rui Barradas

-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com