[R] Fill NA values in columns with values of another column

Wed Aug 28 04:38:14 CEST 2024

Thanks, Calum. After rereading the post, I came to your interpret it
as you did. So glad that we agree.

"easier" of course is in the mind of the beholder. But I'm glad that
you presented a "tidyverse" approach. There are other issues of
dependencies and efficiency that also might be relevant.

Anyway, here is another "simpler" approach -- in the sense that only
base R without any dependencies is needed. Beyond that, I make no
claims.

I first extracted the data from the post and converted them into a
data frame, dat, with two (numeric) column named "Value" and "Group".

Then the following does what (I think) was requested:

spl <-split(seq_len(nrow(dat)), dat$Group)
## a structure giving all row numbers per group

for(grp in unique(dat$Group)){
   ix <- spl[[grp]] ## extract indices for the group
   dat[ix, 'Value'] <- na.omit(dat[ix,'Value'] )[1]
##extract values for the group
##set all values in the Value column for these indices to the first
non-NA value"
}

yielding:

> dat
   Value Group
1      6     8
2      9     5
3      2     1
4      5     6
5      2     7
6      7     2
7      4     4
8      2     7
9      2     7
10    10     3
11     7     2
12     4     4
13     5     6
14     9     5
15     9     5
16     5     6
17    10     3
18     7     2
19     2     1
20     2     7
21     7     2
22     6     8
23     4     4
24     9     5
25     5     6
26     2     1
27     4     4
28     6     8
29    10     3
30    10     3
31     6     8
32     2     1

As Calum said, whether this is really a good approach depends on what
the OP wants to do after this.

Cheers,
Bert

On Tue, Aug 27, 2024 at 5:07 PM CALUM POLWART <polc1410 using gmail.com> wrote:
>
> Bert
>
> I thought she meant she wanted to replace the NAs with the 6. But I could be wrong.
>
> It looks like the data is combined from cbind.
>
> I'm going to give tidyverse examples because it's (/s) *"always"* (/s) easier.
>
> require(tidyverse)
> # impute the missing NAs
> myData <- cbind(VB1d[,1],s1id[,1])
>
> myData |> said[
>   filter(!is.na(1)) |>  #uses col1 would be better to use a name
>   unique() -> referenceData
>
> myData |>
>   select(2) |> #better to name
>   left_join(referenceData) -> cleanData
>
> You will notice I've used column numbers. I suspect cbind will name the columns oddly. And I'm typing this on my phone so it's untested.
>
> If you wanted counts
>
> myData |>
>   filter (!is.na(1)) |>
>   group_by(2) |>
>   summarise (n())
>
> I won't answer the c(5,5) that Bert mentions because that's an extra question of what you do next with the data to know how best to present it.
>
>
> On Wed, 28 Aug 2024, 00:06 Bert Gunter, <bgunter.4567 using gmail.com> wrote:
>>
>> Sorry, not clear to me.
>>
>> For group 8 in your example, do you want extract the values in column
>> 1 that are not NA, i.e. one value, 6; or do you want to extract the
>> number of values -- that is, the count --  that are not NA, i.e. 1?
>>
>> ... and for group 5, would it be c(9,9) for the values; or 2 for the count?
>>
>> Or something else entirely if I have completely misunderstood.
>>
>> Either of the above are easy and quick to do. You can also just remove
>>  the NA's via a version of ?na.omit if that's what you want.
>>
>> Of course, feel free to ignore this and wait for a more helpful
>> response from someone who understands your query better than I.
>>
>> Cheers,
>> Bert
>>
>> On Tue, Aug 27, 2024 at 3:45 PM Francesca PANCOTTO via R-help
>> <r-help using r-project.org> wrote:
>> >
>> > Dear Contributors,
>> > I have a problem with a database composed of many individuals for many
>> > periods, for which I need to perform a manipulation of data as follows.
>> > Here I report the procedure I need to do for the first 32 observations of
>> > the first period.
>> >
>> >
>> > cbind(VB1d[,1],s1id[,1])
>> >       [,1] [,2]
>> >  [1,]    6    8
>> >  [2,]    9    5
>> >  [3,]   NA    1
>> >  [4,]    5    6
>> >  [5,]   NA    7
>> >  [6,]   NA    2
>> >  [7,]    4    4
>> >  [8,]    2    7
>> >  [9,]    2    7
>> > [10,]   NA    3
>> > [11,]   NA    2
>> > [12,]   NA    4
>> > [13,]    5    6
>> > [14,]    9    5
>> > [15,]   NA    5
>> > [16,]   NA    6
>> > [17,]   10    3
>> > [18,]    7    2
>> > [19,]    2    1
>> > [20,]   NA    7
>> > [21,]    7    2
>> > [22,]   NA    8
>> > [23,]   NA    4
>> > [24,]   NA    5
>> > [25,]   NA    6
>> > [26,]    2    1
>> > [27,]    4    4
>> > [28,]    6    8
>> > [29,]   10    3
>> > [30,]   NA    3
>> > [31,]   NA    8
>> > [32,]   NA    1
>> >
>> >
>> > In column s1id, I have numbers from 1 to 8, which are the id of 8 groups ,
>> > randomly mixed in the larger group of 32.
>> > For each group, I want the value that is reported for only to group
>> > members, to all the four group members.
>> >
>> > For example, value 8 in first row , second column, is group 8. The value
>> > for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
>> > 8, I have 6.
>> > But in row 22, the value 8 of the second variable, reports a value NA.
>> > in each group is the same, only two values have the correct number, the
>> > other two are NA.
>> > I need that each group, identified by the values of the variable S1id,
>> > correctly report the number of variable VB1d that is present for just two
>> > group members.
>> >
>> > I hope my explanation is acceptable.
>> > The task appears complex to me right now, especially because I will need to
>> > multiply this procedure for x12x14 similar databases.
>> >
>> > Anyone has ever encountered a similar problem?
>> > Thanks in advance for any help provided.
>> >
>> > ----------------------------------
>> >
>> > Francesca Pancotto
>> >
>> > Associate Professor Political Economy
>> >
>> > University of Modena, Largo Santa Eufemia, 19, Modena
>> >
>> > Office Phone: +39 0522 523264
>> >
>> > Web: *https://sites.google.com/view/francescapancotto/home
>> > <https://sites.google.com/view/francescapancotto/home>*
>> >
>> >  ----------------------------------
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.