[R] Comparing Variables and Writing a New Column

David Winsemius dwinsemius at comcast.net
Mon Feb 1 17:36:20 CET 2010


On Feb 1, 2010, at 11:11 AM, Jerry Floren wrote:

>
> HI,
>
> I am using Windows XP and R version 2.9.2. I have a data frame  
> written by R
> similar to the following:
>
> Lab_ID    Analysis_Soil        Results        -4MAD      -2.5MAD
> +2.5MAD        +4MAD
> 55003    Calcium-2008-116      900             961        1121.5
> 1656.5          1817
> 55003    Calcium-2008-117     3321           2175        2380.5
> 3065.5          3271
> 55003    Calcium-2008-118     3342           3155        4019
> 6899            7763
> 55003    Calcium-2008-119     1664          1005.6     1147.92        
> 1622.32
> 1764.64
> 55003    Calcium-2008-120     2570          1880         
> 2072            2712
> 2904
>
> Previously, I took this table and finished my analysis in Excel using
> Excel's "=if" function.

I'll bet that was painful.

> However, I am sure it can be done in R. What I want
> to do is set up a new data.frame with a new column for Accuracy Flags
> (a_flag) as shown below.
>
> Lab_ID    Analysis_Soil        Results        -4MAD      -2.5MAD
> +2.5MAD        +4MAD        a_flag
> 55003    Calcium-2008-116      900             961        1121.5
> 1656.5          1817         **L
> 55003    Calcium-2008-117     3321           2175        2380.5
> 3065.5          3271         **H
> 55003    Calcium-2008-118     3342           3155        4019
> 6899            7763           *L
> 55003    Calcium-2008-119     1664          1005.6     1147.92        
> 1622.32
> 1764.64      *H
> 55003    Calcium-2008-120     2570          1880         
> 2072            2712
> 2904
>

df.soil$a_flag <- apply(df.soil, 1, function(.x)
           switch(findInterval(.x$Result, c(-Inf,-4,-2.5,2.5,4,Inf)) ,
                                           "**L", "*L", " ", "*H",  
"**H") )



> For each row I need to compare the "Results" submitted by the labs  
> to the
> four "MAD" columns. If the Results are less than -4.0 MAD units from  
> the
> median, labs are flagged "**L" (very low). For results greater than  
> +4.0 MAD
> units, labs are flagged "**H" (very high). Likewise for -2.5 MAD and  
> +2.5
> MAD (*L and *H respectively). As shown in the last row, labs are not  
> flagged
> for results within -2.5 MAD to +2.5 MAD units.
>
> Can anyone get me started on how to look at each row and compare the
> "Results" variable with each of the four "__MAD" variables and then  
> writing
> the appriate flag for Results exceeding -2.5 MAD to +2.5 MAD units  
> from the
> median?
>
> Thanks,
>
> Jerry Floren
> Minnesota Department of Agriculture
>
>
>
>
>
>
>
> -- 
> View this message in context: http://n4.nabble.com/Comparing-Variables-and-Writing-a-New-Column-tp1458947p1458947.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list