[R] Comparing Variables and Writing a New Column
David Winsemius
dwinsemius at comcast.net
Mon Feb 1 17:53:36 CET 2010
On Feb 1, 2010, at 11:36 AM, David Winsemius wrote:
>
> On Feb 1, 2010, at 11:11 AM, Jerry Floren wrote:
>
>>
>> HI,
>>
>> I am using Windows XP and R version 2.9.2. I have a data frame
>> written by R
>> similar to the following:
>>
>> Lab_ID Analysis_Soil Results -4MAD -2.5MAD
>> +2.5MAD +4MAD
>> 55003 Calcium-2008-116 900 961 1121.5
>> 1656.5 1817
>> 55003 Calcium-2008-117 3321 2175 2380.5
>> 3065.5 3271
>> 55003 Calcium-2008-118 3342 3155 4019
>> 6899 7763
>> 55003 Calcium-2008-119 1664 1005.6
>> 1147.92 1622.32
>> 1764.64
>> 55003 Calcium-2008-120 2570 1880
>> 2072 2712
>> 2904
>>
>> Previously, I took this table and finished my analysis in Excel using
>> Excel's "=if" function.
>
> I'll bet that was painful.
>
>> However, I am sure it can be done in R. What I want
>> to do is set up a new data.frame with a new column for Accuracy Flags
>> (a_flag) as shown below.
>>
>> Lab_ID Analysis_Soil Results -4MAD -2.5MAD
>> +2.5MAD +4MAD a_flag
>> 55003 Calcium-2008-116 900 961 1121.5
>> 1656.5 1817 **L
>> 55003 Calcium-2008-117 3321 2175 2380.5
>> 3065.5 3271 **H
>> 55003 Calcium-2008-118 3342 3155 4019
>> 6899 7763 *L
>> 55003 Calcium-2008-119 1664 1005.6
>> 1147.92 1622.32
>> 1764.64 *H
>> 55003 Calcium-2008-120 2570 1880
>> 2072 2712
>> 2904
>>
>
> df.soil$a_flag <- apply(df.soil, 1, function(.x)
> switch(findInterval(.x$Result, c(-Inf,-4,-2.5,2.5,4,Inf)) ,
> "**L", "*L", " ", "*H",
> "**H") )
Sorry I didn't note the irregularity of the limits. Try instead:
df.soil$a_flag <- apply(df.soil, 1, function(.x)
switch(findInterval(.x[3], as.numeric( c(-Inf,.x[4:7],Inf)) ),
"**L", "*L", " ", "*H",
"**H") )
I wasn't sure why the passage of the function made it necessary to add
the as.numeric, but I am guessing that .x became a character vector in
order to hold the first two columns. That would not be be needed, it
you only passed the last 5 columns as df.soil[3:7] as an argument to
apply.
>
>
>
>> For each row I need to compare the "Results" submitted by the labs
>> to the
>> four "MAD" columns. If the Results are less than -4.0 MAD units
>> from the
>> median, labs are flagged "**L" (very low). For results greater than
>> +4.0 MAD
>> units, labs are flagged "**H" (very high). Likewise for -2.5 MAD
>> and +2.5
>> MAD (*L and *H respectively). As shown in the last row, labs are
>> not flagged
>> for results within -2.5 MAD to +2.5 MAD units.
>>
>> Can anyone get me started on how to look at each row and compare the
>> "Results" variable with each of the four "__MAD" variables and then
>> writing
>> the appriate flag for Results exceeding -2.5 MAD to +2.5 MAD units
>> from the
>> median?
>>
>> Thanks,
>>
>> Jerry Floren
>> Minnesota Department of Agriculture
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://n4.nabble.com/Comparing-Variables-and-Writing-a-New-Column-tp1458947p1458947.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list