[Rd] How does "subset" replace arguments? (PR#4193)
Thomas Lumley
tlumley at u.washington.edu
Tue Sep 16 14:11:48 MEST 2003
On Tue, 16 Sep 2003 axel.benz at iao.fhg.de wrote:
> Full_Name: Axel Benz
> Version: 1.7.1
> OS: Windows
> Submission from: (NULL) (137.251.33.43)
>
>
> Hello, I guess many people will answer me again that this is a S
> language feature, but I am only a stupid computer scientist and I simply
> do not understand this logic, despite of reading a lot about S:
The point they are trying to make is that you should send this sort of
question to r-devel or r-help, not r-bugs. The point of r-bugs is as a
repository for bug reports, not as a discussion list.
> > test
> field tuckey
> 4 Kreis2 -1
> 5 Kreis5 -2
> 9 Metall -3
> 17 Kreis1 -4
> 19 Kreis8 -5
>
> > subset(test,field=="Metall")
> field tuckey
> 9 Metall -3
>
> > subset(test,toString(field)=="Metall")
> [1] field tuckey
> <0 rows> (or 0-length row.names)
>
> This happens everytime I use a function with the column name ("field", in this
> case) as parameter in the logic expression in "subset", instead of using the
> column name on top level. I have the impression that the column name is only
> replaced when standing in top level position. I would call that "very lazy
> evaluation" ;-) ;-)
> Thank you for a friendly answer, this language is realy weird to me.
>
Your impression is incorrect. The problem with toString is that it
collapses a vector to a single string, so toString(field) is the string
"Kreis2, Kreis5, Metall, Kries1, Kries8". There is no record whose
`field' is equal to that string. Did you check to see that toString did
what you thought it did?
subset() will work as I think you expect if the output of the function is
the same length as the input.
For example, consider one of the built-in data sets
data(esoph)
> subset(esoph, toString(agegp)=="75+")
[1] agegp alcgp tobgp ncases ncontrols
<0 rows> (or 0-length row.names)
but
> subset(esoph, as.character(agegp)=="75+")
agegp alcgp tobgp ncases ncontrols
78 75+ 0-39g/day 0-9g/day 1 18
79 75+ 0-39g/day 10-19 2 6
80 75+ 0-39g/day 30+ 1 3
81 75+ 40-79 0-9g/day 2 5
82 75+ 40-79 10-19 1 3
83 75+ 40-79 20-29 0 3
84 75+ 40-79 30+ 1 1
85 75+ 80-119 0-9g/day 1 1
86 75+ 80-119 10-19 1 1
87 75+ 120+ 0-9g/day 2 2
88 75+ 120+ 10-19 1 1
or to take a really extreme version
> subset(esoph, substr(paste(as.character(agegp),toupper(as.character(agegp))),3,6)== "+ 75")
agegp alcgp tobgp ncases ncontrols
78 75+ 0-39g/day 0-9g/day 1 18
79 75+ 0-39g/day 10-19 2 6
80 75+ 0-39g/day 30+ 1 3
81 75+ 40-79 0-9g/day 2 5
82 75+ 40-79 10-19 1 3
83 75+ 40-79 20-29 0 3
84 75+ 40-79 30+ 1 1
85 75+ 80-119 0-9g/day 1 1
86 75+ 80-119 10-19 1 1
87 75+ 120+ 0-9g/day 2 2
88 75+ 120+ 10-19 1 1
-thomas
More information about the R-devel
mailing list