[R] Fill NA values in columns with values of another column

Thu Aug 29 23:29:16 CEST 2024

Petr et.al:

I think using merge is a very nice idea! (note that the email omitted the
last rows of the result, though your code of course produced them)

The only minor problem is that the order of the rows in the result is
changed from the original. If the OP needs to preserve the original
ordering, that can be easily done. Here is a complete implementation of
your idea (I think).

## assume that dat is a data frame with the first two columns as in the
OP's post, i.e. the first column is the Value with NA's and the second is
the Group

 spl <-  dat |> nrow() |> seq_len() |> split(dat[[2]]) |> unlist() ## for
reordering
 dat[spl, 1:2] <-
    dat[, 1:2] |>
    na.omit() |>  ## remove rows with NA's
    unique() |>  ## remove duplicate rows
    merge(dat[, 1:2], by.x=2, by.y=2) |> _[, 2:1]  ## and now merge()

 Note the final reordering of the first two columns because of the way
merge() works.
I suspect there may be a slicker way to do this using unsplit(), but I
could not figure out how

The result is:
> dat
   Value Group
1      6     8
2      9     5
3      2     1
4      5     6
5      2     7
6      7     2
7      4     4
8      2     7
9      2     7
10    10     3
11     7     2
12     4     4
13     5     6
14     9     5
15     9     5
16     5     6
17    10     3
18     7     2
19     2     1
20     2     7
21     7     2
22     6     8
23     4     4
24     9     5
25     5     6
26     2     1
27     4     4
28     6     8
29    10     3
30    10     3
31     6     8
32     2     1

Cheers,
Bert

On Wed, Aug 28, 2024 at 10:53 PM Petr Pikal <petr.pik using gmail.com> wrote:

> Hallo Francesca
>
> If you had an object with correct setting, something like template
>
> > dput(res)
> structure(list(V1 = c("1", "2", "3", "4", "5", "6", "7", "8"),
>     V2 = c(2, 7, 10, 4, 9, 5, 2, 6)), class = "data.frame", row.names =
> c("1",
> "2", "3", "4", "5", "6", "7", "8"))
>
> you could merge it with your object where some values are missing
>
> > dput(daf)
> structure(list(X1 = c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA,
> NA, NA, 5L, 9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA,
> 2L, 4L, 6L, 10L, NA, NA, NA), X2 = c(8L, 5L, 1L, 6L, 7L, 2L,
> 4L, 7L, 7L, 3L, 2L, 4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L,
> 4L, 5L, 6L, 1L, 4L, 8L, 3L, 3L, 8L, 1L)), class = "data.frame", row.names =
> c(NA,
> -32L))
>
> > merge(daf, res, by.x="X2", by.y="V1")
>    X2 X1 V2
> 1   1 NA  2
> 2   1 NA  2
> 3   1  2  2
> 4   1  2  2
> 5   2 NA  7
> 6   2 NA  7
> 7   2  7  7
> 8   2  7  7
> 9   3 10 10
> 10  3 NA 10
> 11  3 10 10
> 12  3 NA 10
> 13  4  4  4
> 14  4 NA  4
> 15  4  4  4
> 16  4 NA  4
> 17  5  9  9
> 18  5 NA  9
> 19  5 NA  9
>
> Cheers.
> Petr
>
>
>
>
> st 28. 8. 2024 v 0:45 odesílatel Francesca PANCOTTO via R-help <
> r-help using r-project.org> napsal:
>
> > Dear Contributors,
> > I have a problem with a database composed of many individuals for many
> > periods, for which I need to perform a manipulation of data as follows.
> > Here I report the procedure I need to do for the first 32 observations of
> > the first period.
> >
> >
> > cbind(VB1d[,1],s1id[,1])
> >       [,1] [,2]
> >  [1,]    6    8
> >  [2,]    9    5
> >  [3,]   NA    1
> >  [4,]    5    6
> >  [5,]   NA    7
> >  [6,]   NA    2
> >  [7,]    4    4
> >  [8,]    2    7
> >  [9,]    2    7
> > [10,]   NA    3
> > [11,]   NA    2
> > [12,]   NA    4
> > [13,]    5    6
> > [14,]    9    5
> > [15,]   NA    5
> > [16,]   NA    6
> > [17,]   10    3
> > [18,]    7    2
> > [19,]    2    1
> > [20,]   NA    7
> > [21,]    7    2
> > [22,]   NA    8
> > [23,]   NA    4
> > [24,]   NA    5
> > [25,]   NA    6
> > [26,]    2    1
> > [27,]    4    4
> > [28,]    6    8
> > [29,]   10    3
> > [30,]   NA    3
> > [31,]   NA    8
> > [32,]   NA    1
> >
> >
> > In column s1id, I have numbers from 1 to 8, which are the id of 8 groups
> ,
> > randomly mixed in the larger group of 32.
> > For each group, I want the value that is reported for only to group
> > members, to all the four group members.
> >
> > For example, value 8 in first row , second column, is group 8. The value
> > for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
> > 8, I have 6.
> > But in row 22, the value 8 of the second variable, reports a value NA.
> > in each group is the same, only two values have the correct number, the
> > other two are NA.
> > I need that each group, identified by the values of the variable S1id,
> > correctly report the number of variable VB1d that is present for just two
> > group members.
> >
> > I hope my explanation is acceptable.
> > The task appears complex to me right now, especially because I will need
> to
> > multiply this procedure for x12x14 similar databases.
> >
> > Anyone has ever encountered a similar problem?
> > Thanks in advance for any help provided.
> >
> > ----------------------------------
> >
> > Francesca Pancotto
> >
> > Associate Professor Political Economy
> >
> > University of Modena, Largo Santa Eufemia, 19, Modena
> >
> > Office Phone: +39 0522 523264
> >
> > Web: *https://sites.google.com/view/francescapancotto/home
> > <https://sites.google.com/view/francescapancotto/home>*
> >
> >  ----------------------------------
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]