[R] Replacing NA s with the average

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Mon Oct 18 16:58:38 CEST 2021


Hello,

Please don't post in HTML, post in plain text like the posting guide 
asks for. Your data is unreadable.

Here are two test data sets, one with all columns numeric, the other 
with some columns numeric.


wbpractice1 <- mtcars  # all columns are numeric
wbpractice2 <- iris    # not all columns are numeric
wbpractice1[] <- lapply(wbpractice1, \(x){
   is.na(x) <- sample(length(x), 0.25*length(x))
   x
})
wbpractice2[-5] <- lapply(wbpractice2[-5], \(x){
   is.na(x) <- sample(length(x), 0.25*length(x))
   x
})


#---

If all columns are numeric just lapply an anonymous function to each of 
them replacing the values where is.na is TRUE by the mean.


wbpractice1[] <- lapply(wbpractice1, \(x){
   x[is.na(x)] <- mean(x, na.rm = TRUE)
   x
})


But if some columns are not numeric, determine which are first, then 
apply the same code to that subset.


num_cols <- sapply(wbpractice2, is.numeric)
wbpractice2[num_cols] <- lapply(wbpractice2[num_cols], \(x){
   x[is.na(x)] <- mean(x, na.rm = TRUE)
   x
})


And here are dplyr solutions.


library(dplyr)

wbpractice1 %>%
   mutate(across(everything(), ~ifelse(is.na(.x), mean(.x, na.rm = 
TRUE), .x)))

wbpractice2 %>%
   mutate(across(where(is.numeric), ~ifelse(is.na(.x), mean(.x, na.rm = 
TRUE), .x)))



Hope this helps,

Rui Barradas


Às 13:38 de 18/10/21, Admire Tarisirayi Chirume escreveu:
> Good day colleagues. Below is a csv file attached which i am using in my
>> analysis.
>>
>>
>>
>> household.id <http://hh.id>
>>
>> hd17.perm
>>
>> hd17employ
>>
>> health.exp
>>
>> total.food.exp
>>
>> total.nfood.exp
>>
>> 1
>>
>> 2
>>
>> yes
>>
>> 1654
>>
>> 23654
>>
>> 23655
>>
>> 2
>>
>> 2
>>
>> yes
>>
>> NA
>>
>> NA
>>
>> 65984
>>
>> 3
>>
>> 6
>>
>> no
>>
>> 2547
>>
>> 123311
>>
>> 52416
>>
>> 4
>>
>> 8
>>
>> NA
>>
>> 2365
>>
>> 13648
>>
>> 12544
>>
>> 5
>>
>> 6
>>
>> NA
>>
>> 1254
>>
>> 36549
>>
>> 12365
>>
>> 6
>>
>> 8
>>
>> yes
>>
>> 1236
>>
>> 236541
>>
>> 26522
>>
>> 7
>>
>> 8
>>
>> no
>>
>> NA
>>
>> 13264
>>
>> 23698
>>
>>
>>
>>
>>
>> So I created a df using the above and its a csv file as follows
>>
>> wbpractice <- read.csv("world_practice.csv")
>>
>> Now i am doing data cleaning and trying to replace all missing values with
>> the averages of the respective columns.
>>
>> the dimension of the actual dataset is;
>>
>> dim(wbpractice)
> [1] 31998    6
> 
> I used the following script which i executed by i got some error messages
> 
> for(i in 1:ncol( wbpractice  )){
>       wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice  [,i],
> na.rm = TRUE)
>      }
> 
> Any help to replace all NAs with average values in my dataframe?
> 
> 
> 
>>
>>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list