[R] using subset() in data frame

Chuck Cleland ccleland at optonline.net
Sat Feb 23 12:16:28 CET 2008


On 2/23/2008 6:09 AM, Chuck Cleland wrote:
> On 2/22/2008 8:01 PM, Robert Walters wrote:
>> R folks,
>> As an R novice, I struggle with the mystery of subsetting. Textbook 
>> and online examples of this seem quite straightforward yet I cannot 
>> get my mind around it. For practice, I'm using the code in MASS Ch. 6, 
>> "whiteside data" to analyze a different data set with similar 
>> variables and structure.
>> Here is my data frame:
>>
>> ###subset one of three cases for the variable 'position'
>>  >data.b<-data.a[data.a$position=="inrow",]
>>  > print(data.b)
>>       position  porosity    x       y
>> 1     inrow     macro     1.40   16.5
>> 2     inrow     macro      .       .
>>         .          .        .       .
>>         .          .        .       .
>> 7     inrow     micro
>> 8     inrow     micro
>>
>> Now I want to do separate lm's for each case of porosity, macro and 
>> micro. The code as given in MASS, p.141, slightly modified would be:
>>
>> fit1 <- lm(y ~ x, data=data.b, subset = porosity == "macro")
>> fit2 <- update(fit1, subset = porosity == "micro")
>>
>> ###simplest code with subscripting
>> fit1 <- lm(y ~ x, data.b[porosity=="macro"])
> 
>   Assuming data.b has two dimensions, you need a comma after 
> porosity=="macro" to indicate that you are selecting a subset of rows of 
> the data frame:
> 
> fit1 <- lm(y ~ x, data.b[porosity=="macro",])

   Actually, that should be:

fit1 <- lm(y ~ x, data.b[data.b$porosity=="macro",])

   because [.data.frame needs to know where to find porosity, and it 
won't know to look inside of data.b unless you direct it to look there.

>> ###following example in ?subset
>> fit1 <- lm(y ~ x, data.b, subset(data.b, porosity, select=macro))
> 
>   The select argument to subset is meant to select variables (i.e., it 
> indicates "columns to select from a data frame") and you are misusing it 
> by specifying the level of a factor.  If you make your call to subset by 
> itself (a good idea when you are learning how a function works), you 
> should get an error like this:
> 
>  > subset(whiteside, Insul, select=Before)
> Error in subset.data.frame(whiteside, Insul, select = Before) :
>   'subset' must evaluate to logical
> 
>  What I think you intended was this:
> 
> subset(data.b, porosity == "macro")
> 
>   Even with the correct call to subset, you also don't want both data.b 
> and the subset piece, because subset returns a data frame.  In other 
> words, you would be passing lm() two different data frames.  So try this 
> instead:
> 
> fit1 <- lm(y ~ x, subset(data.b, porosity == "macro"))
> 
>> None of th above, plus many permutations thereof, works.
>> Can anyone educate me?
>>
>> Thanks,
>>
>> Robert Walters
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.  

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list