[R] Fill NA values in columns with values of another column
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Wed Aug 28 04:38:14 CEST 2024
Thanks, Calum. After rereading the post, I came to your interpret it
as you did. So glad that we agree.
"easier" of course is in the mind of the beholder. But I'm glad that
you presented a "tidyverse" approach. There are other issues of
dependencies and efficiency that also might be relevant.
Anyway, here is another "simpler" approach -- in the sense that only
base R without any dependencies is needed. Beyond that, I make no
claims.
I first extracted the data from the post and converted them into a
data frame, dat, with two (numeric) column named "Value" and "Group".
Then the following does what (I think) was requested:
spl <-split(seq_len(nrow(dat)), dat$Group)
## a structure giving all row numbers per group
for(grp in unique(dat$Group)){
ix <- spl[[grp]] ## extract indices for the group
dat[ix, 'Value'] <- na.omit(dat[ix,'Value'] )[1]
##extract values for the group
##set all values in the Value column for these indices to the first
non-NA value"
}
yielding:
> dat
Value Group
1 6 8
2 9 5
3 2 1
4 5 6
5 2 7
6 7 2
7 4 4
8 2 7
9 2 7
10 10 3
11 7 2
12 4 4
13 5 6
14 9 5
15 9 5
16 5 6
17 10 3
18 7 2
19 2 1
20 2 7
21 7 2
22 6 8
23 4 4
24 9 5
25 5 6
26 2 1
27 4 4
28 6 8
29 10 3
30 10 3
31 6 8
32 2 1
As Calum said, whether this is really a good approach depends on what
the OP wants to do after this.
Cheers,
Bert
On Tue, Aug 27, 2024 at 5:07 PM CALUM POLWART <polc1410 using gmail.com> wrote:
>
> Bert
>
> I thought she meant she wanted to replace the NAs with the 6. But I could be wrong.
>
> It looks like the data is combined from cbind.
>
> I'm going to give tidyverse examples because it's (/s) *"always"* (/s) easier.
>
> require(tidyverse)
> # impute the missing NAs
> myData <- cbind(VB1d[,1],s1id[,1])
>
> myData |> said[
> filter(!is.na(1)) |> #uses col1 would be better to use a name
> unique() -> referenceData
>
> myData |>
> select(2) |> #better to name
> left_join(referenceData) -> cleanData
>
> You will notice I've used column numbers. I suspect cbind will name the columns oddly. And I'm typing this on my phone so it's untested.
>
> If you wanted counts
>
> myData |>
> filter (!is.na(1)) |>
> group_by(2) |>
> summarise (n())
>
> I won't answer the c(5,5) that Bert mentions because that's an extra question of what you do next with the data to know how best to present it.
>
>
> On Wed, 28 Aug 2024, 00:06 Bert Gunter, <bgunter.4567 using gmail.com> wrote:
>>
>> Sorry, not clear to me.
>>
>> For group 8 in your example, do you want extract the values in column
>> 1 that are not NA, i.e. one value, 6; or do you want to extract the
>> number of values -- that is, the count -- that are not NA, i.e. 1?
>>
>> ... and for group 5, would it be c(9,9) for the values; or 2 for the count?
>>
>> Or something else entirely if I have completely misunderstood.
>>
>> Either of the above are easy and quick to do. You can also just remove
>> the NA's via a version of ?na.omit if that's what you want.
>>
>> Of course, feel free to ignore this and wait for a more helpful
>> response from someone who understands your query better than I.
>>
>> Cheers,
>> Bert
>>
>> On Tue, Aug 27, 2024 at 3:45 PM Francesca PANCOTTO via R-help
>> <r-help using r-project.org> wrote:
>> >
>> > Dear Contributors,
>> > I have a problem with a database composed of many individuals for many
>> > periods, for which I need to perform a manipulation of data as follows.
>> > Here I report the procedure I need to do for the first 32 observations of
>> > the first period.
>> >
>> >
>> > cbind(VB1d[,1],s1id[,1])
>> > [,1] [,2]
>> > [1,] 6 8
>> > [2,] 9 5
>> > [3,] NA 1
>> > [4,] 5 6
>> > [5,] NA 7
>> > [6,] NA 2
>> > [7,] 4 4
>> > [8,] 2 7
>> > [9,] 2 7
>> > [10,] NA 3
>> > [11,] NA 2
>> > [12,] NA 4
>> > [13,] 5 6
>> > [14,] 9 5
>> > [15,] NA 5
>> > [16,] NA 6
>> > [17,] 10 3
>> > [18,] 7 2
>> > [19,] 2 1
>> > [20,] NA 7
>> > [21,] 7 2
>> > [22,] NA 8
>> > [23,] NA 4
>> > [24,] NA 5
>> > [25,] NA 6
>> > [26,] 2 1
>> > [27,] 4 4
>> > [28,] 6 8
>> > [29,] 10 3
>> > [30,] NA 3
>> > [31,] NA 8
>> > [32,] NA 1
>> >
>> >
>> > In column s1id, I have numbers from 1 to 8, which are the id of 8 groups ,
>> > randomly mixed in the larger group of 32.
>> > For each group, I want the value that is reported for only to group
>> > members, to all the four group members.
>> >
>> > For example, value 8 in first row , second column, is group 8. The value
>> > for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
>> > 8, I have 6.
>> > But in row 22, the value 8 of the second variable, reports a value NA.
>> > in each group is the same, only two values have the correct number, the
>> > other two are NA.
>> > I need that each group, identified by the values of the variable S1id,
>> > correctly report the number of variable VB1d that is present for just two
>> > group members.
>> >
>> > I hope my explanation is acceptable.
>> > The task appears complex to me right now, especially because I will need to
>> > multiply this procedure for x12x14 similar databases.
>> >
>> > Anyone has ever encountered a similar problem?
>> > Thanks in advance for any help provided.
>> >
>> > ----------------------------------
>> >
>> > Francesca Pancotto
>> >
>> > Associate Professor Political Economy
>> >
>> > University of Modena, Largo Santa Eufemia, 19, Modena
>> >
>> > Office Phone: +39 0522 523264
>> >
>> > Web: *https://sites.google.com/view/francescapancotto/home
>> > <https://sites.google.com/view/francescapancotto/home>*
>> >
>> > ----------------------------------
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list