[R] subset function unexpected behavior

David Katz david at davidkatzconsulting.com
Tue Feb 2 17:08:20 CET 2010


Thanks, that helps! Subset creates a new context where a name clash can
occur. So if I don't want to check for that possibility, I should use a
special kind of index like .sch, or avoid subset:

for(sch in school.list){
  print(sch)
  print(input.data[input.data[,school.var] == sch,])}

which works no matter what variable names I use. That seems like a
reasonable requirement for good code.

(Checking for a name clash would be at least theoretically needed since
school.var is a parameter that can be any character name.)

Although subset conveniently avoids extra typing in many cases (not here),
this suggests to me that it's not ideal for code that can be used in a
variety of contexts. Note that unlike "attach", subset does not issue a
warning! 

-----

Hi: 

Try this for your second loop instead: 

for(s in school.list){ 
  print(s) 
  print(subset(input.data, sch == s)) 
 } 
[1] 1 
  sch pop 
1   1 100 
2   1 200 
[1] 2 
  sch pop 
3   2 300 
4   2 400 

Don't confound the 'sch' variable in your data frame with the 
index in your loop :) 

HTH, 
Dennis 

On Mon, Feb 1, 2010 at 8:17 PM, David Katz <[hidden email]>wrote: 
- Hide quoted text -

> 
> I was surprised to see this unexpected behavior of subset in a for loop. I 
> looked in subset.data.frame and it seemed to me that both versions should 
> work, since the subset call should be evaluated in the global environment
> - 
> but perhaps I don't understand environments well enough. Can someone 
> enlighten me? In any case, this is a bit of a gotcha for naive users of 
> subset. 
> 
> input.data <- 
>  data.frame(sch=c(1,1,2,2), 
>             pop=c(100,200,300,400)) 
> 
> school.var <- "sch" 
> 
> school.list <- 1:2 
> 
> for(sch in school.list){ 
>  print(sch) 
>  #do this before subset!: 
>  right.sch.p <- 
>    input.data[,school.var] == sch 
>  print(  subset(input.data,right.sch.p)) #this is what I expected 
> } 
> 
> ## [1] 1 
> ##   sch pop 
> ## 1   1 100 
> ## 2   1 200 
> ## [1] 2 
> ##   sch pop 
> ## 3   2 300 
> ## 4   2 400 
> 
> 
> for(sch in school.list){ 
>  print(sch) 
>  print(subset(input.data,input.data[,school.var] == sch)) #note - compact 
> version fails! 
> } 
> 
> ## [1] 1 
> ##   sch pop 
> ## 1   1 100 
> ## 2   1 200 
> ## 3   2 300 
> ## 4   2 400 
> ## [1] 2 
> ##   sch pop 
> ## 1   1 100 
> ## 2   1 200 
> ## 3   2 300 
> ## 4   2 400 
> 

-- 
View this message in context: http://n4.nabble.com/subset-function-unexpected-behavior-tp1459535p1460057.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list