[R] Comparing Variables and Writing a New Column

David Winsemius dwinsemius at comcast.net
Mon Feb 1 19:47:14 CET 2010


On Feb 1, 2010, at 1:35 PM, Jerry Floren wrote:

>
> Hi David,
>
> Once again, thanks for your help. I still need some help. My  
> original post
> was quite simplified, and perhaps that was a mistake.
>
> Here is the actual code and screen output from R:
>
> # Set the working directory
>
> setwd("C:\\Documents and Settings\\jfloren\\My Documents\\TestRSoil")
>
> # Read from the table generated in step 1
>
> labinfo <- read.table(file = "./TableTestSoil1-25-10.txt", header =  
> TRUE,
> sep = "\t")
>
> attach(labinfo)
>
> soil = data.frame(NAPT_ID, Date_Recd, Soil, Primary_A, Anlysis,
> Anlysis_Soil, Unit, Results, count, minall, maxall, amed, rndmedian,  
> aMAD,
> rndMAD, rndNoF, rndm4, rndm2, rndp2, rndp4, rndavg, rndsd)
>
> detach(labinfo)
> attach(soil)
>
> # NOTE: Here is the soil data frame. This is from another table that  
> has
> already calculated the median, Median Absolute Deviation (MAD), and  
> the -4.0
> MAD to + 4.0 MAD values:
>>
>> str(soil)
> 'data.frame':   9563 obs. of  22 variables:
> $ NAPT_ID      : Factor w/ 27 levels "13003","14001",..: 26 16 25 12  
> 9 5 17
> 14 20 19 ...
> $ Date_Recd    : Factor w/ 46 levels "1/0/00","10/15/2008",..: 9 9 9  
> 2 10 3
> 7 11 9 12 ...
> $ Soil         : Factor w/ 21 levels "2008-116","2008-117",..: 1 1 1  
> 1 1 1
> 1 1 1 1 ...
> $ Primary_A    : Factor w/ 9 levels "Bases","Buffer pH, Lime  
> Req.",..: 1 1
> 1 1 1 1 1 1 1 1 ...
> $ Anlysis     : Factor w/ 32 levels "Anlysis","B",..: 5 5 5 5 5 5 5  
> 5 5 5
> ...
> $ Anlysis_Soil: Factor w/ 611 levels "Anlysis_Soil",..: 2 2 2 2 2 2  
> 2 2 2 2
> ...
> $ Unit         : Factor w/ 4 levels "%","(dS/m)","mg/kg",..: 3 3 3 3  
> 3 3 3
> 3 3 3 ...
> $ Results      : Factor w/ 3796 levels "0.05","0.06",..: 887 456 567  
> 574
> 575 603 614 626 627 641 ...
> $ count        : Factor w/ 19 levels "10","11","12",..: 12 12 12 12  
> 12 12
> 12 12 12 12 ...
> $ minall       : Factor w/ 477 levels "0.05","0.06",..: 150 150 150  
> 150 150
> 150 150 150 150 150 ...
> $ maxall       : Factor w/ 516 levels "0.12","0.17",..: 115 115 115  
> 115 115
> 115 115 115 115 115 ...
> $ amed         : Factor w/ 552 levels "0.1","0.105",..: 133 133 133  
> 133 133
> 133 133 133 133 133 ...
> $ rndmedian    : Factor w/ 531 levels "0.1","0.11","0.13",..: 122  
> 122 122
> 122 122 122 122 122 122 122 ...
> $ aMAD         : Factor w/ 433 levels "0.00499999999999989",..: 193  
> 193 193
> 193 193 193 193 193 193 193 ...
> $ rndMAD       : Factor w/ 341 levels "0","0.01","0.02",..: 107 107  
> 107 107
> 107 107 107 107 107 107 ...
> $ rndNoF       : Factor w/ 335 levels "0.2","0.9","1",..: 108 108  
> 108 108
> 108 108 108 108 108 108 ...
> $ rndm4        : Factor w/ 537 levels "-0.04","-0.05",..: 533 533  
> 533 533
> 533 533 533 533 533 533 ...
> $ rndm2        : Factor w/ 529 levels "-0.04","-0.1",..: 126 126 126  
> 126
> 126 126 126 126 126 126 ...
> $ rndp2        : Factor w/ 554 levels "0.13","0.14",..: 143 143 143  
> 143 143
> 143 143 143 143 143 ...
> $ rndp4        : Factor w/ 545 levels "0.15","0.16",..: 151 151 151  
> 151 151
> 151 151 151 151 151 ...
> $ rndavg       : Factor w/ 561 levels "0.1","0.11","0.13",..: 127  
> 127 127
> 127 127 127 127 127 127 127 ...
> $ rndsd        : Factor w/ 416 levels "0.02","0.03",..: 256 256 256  
> 256 256
> 256 256 256 256 256 ...
>>
> # NOTE: Original posting -4MAD is rndm4 (col 17); -2.5MAD is rndm2  
> (col 18);
> +2.5MAD is rndp2 (col 19); +4MAD is rndp4 (col 20). I changed the  
> columns to
> 17:18 in the following as.numeric code you supplied:
>
>>
>> df.soil$a_flag <- apply(df.soil, 1, function(.x)
> +           switch(findInterval(.x[3], as.numeric( c(- 
> Inf,.x[17:20],Inf)) ),


You also  need to change the position of the "Results" column to match  
your new dataframe structure. It's now the 8-th column,

So perhaps:
             switch(findInterval(.x[8], as.numeric( c(- 
Inf,.x[17:20],Inf)) ),


> +                                           "**L", "*L", " ", "*H",
> + "**H") )
> Error in apply(df.soil, 1, function(.x) switch(findInterval(.x[3],
> as.numeric(c(-Inf,  :
>  object 'df.soil' not found
>>
>
> # End of R code with the error message "object 'df.soil' not found"

Your dataframe is named soil. Mine was named df.soil. They're just  
names, not functions.

>
> I also thought that perhaps "df.soil" was an abbreviation for
> "data.frame.soil" However, with either "df.soil" or  
> "data.frame.soil", R
> sends the error message, "object 'df.soil' ('data.frame.soil') not  
> found."

Well, you never told me what your dataframe was named.

>
> Are you able to see what I am doing wrong?

Maybe ... can't be sure. No data yet forwarded in a form that can be  
easily tested. You could provide an unambiguous example by using:

dput(head(soil))

... or you could follow the Posting Guide which uses dump().

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list