[R] Replacing NAs in long format

jim holtman jholtman at gmail.com
Sat Nov 3 19:30:47 CET 2012


> x <- read.table(text = "idr  schyear year
+  1       8    0
+  1       9    1
+  1      10   NA
+  2       4   NA
+  2       5   -1
+  2       6    0
+  2       7    1
+  2       8    2
+  2       9    3
+  2      10    4
+  2      11   NA
+  2      12    6
+  3       4   NA
+  3       5   -2
+  3       6   -1
+  3       7    0
+  3       8    1
+  3       9    2
+  3      10    3
+  3      11   NA", header = TRUE)
>  # you did not specify if there might be multiple contiguous NAs,
>  # so there are a lot of checks to be made
>  x.l <- lapply(split(x, x$idr), function(.idr){
+     # check for all NAs -- just return indeterminate state
+     if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
+     # repeat until all NAs have been fixed; takes care of contiguous ones
+     while (any(is.na(.idr$year))){
+         # find all the NAs
+         for (i in which(is.na(.idr$year))){
+             if ((i == 1L) && (!is.na(.idr$year[i + 1L]))){
+                 .idr$year[i] <- .idr$year[i + 1L] - 1
+             } else if ((i > 1L) && (!is.na(.idr$year[i - 1L]))){
+                 .idr$year[i] <- .idr$year[i - 1L] + 1
+             } else if ((i < nrow(.idr)) && (!is.na(.idr$year[i + 1L]))){
+                 .idr$year[i] <- .idr$year[i + 1L] -1
+             }
+         }
+     }
+     return(.idr)
+ })
> do.call(rbind, x.l)
     idr schyear year
1.1    1       8    0
1.2    1       9    1
1.3    1      10    2
2.4    2       4   -2
2.5    2       5   -1
2.6    2       6    0
2.7    2       7    1
2.8    2       8    2
2.9    2       9    3
2.10   2      10    4
2.11   2      11    5
2.12   2      12    6
3.13   3       4   -3
3.14   3       5   -2
3.15   3       6   -1
3.16   3       7    0
3.17   3       8    1
3.18   3       9    2
3.19   3      10    3
3.20   3      11    4
>
>


On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
<cddesjardins at gmail.com> wrote:
> Hi,
> I have the following data:
>
>> data[1:20,c(1,2,20)]
> idr  schyear year
> 1       8    0
> 1       9    1
> 1      10   NA
> 2       4   NA
> 2       5   -1
> 2       6    0
> 2       7    1
> 2       8    2
> 2       9    3
> 2      10    4
> 2      11   NA
> 2      12    6
> 3       4   NA
> 3       5   -2
> 3       6   -1
> 3       7    0
> 3       8    1
> 3       9    2
> 3      10    3
> 3      11   NA
>
> What I want to do is replace the NAs in the year variable with the
> following:
>
> idr  schyear year
> 1       8    0
> 1       9    1
> 1      10   2
> 2       4   -2
> 2       5   -1
> 2       6    0
> 2       7    1
> 2       8    2
> 2       9    3
> 2      10    4
> 2      11   5
> 2      12    6
> 3       4   -3
> 3       5   -2
> 3       6   -1
> 3       7    0
> 3       8    1
> 3       9    2
> 3      10    3
> 3      11   4
>
> I have no idea how to do this. What it needs to do is make sure that for
> each subject (idr) that it either adds a 1 if it is preceded by a value in
> year or subtracts a 1 if it comes before a year value.
>
> Does that make sense? I could do this in Excel but I am at a loss for how
> to do this in R. Please reply to me as well as the list if you respond.
>
> Thanks!
> Chris
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list