[R] How to reference or sort rownames in a data frame
Robert A. LaBudde
ral at lcfltd.com
Sun May 27 22:55:41 CEST 2007
As I was working through elementary examples, I was using dataset
"plasma" of package "HSAUR".
In performing a logistic regression of the data, and making the
diagnostic plots (R-2.5.0)
data(plasma,package='HSAUR')
plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
layout(matrix(1:4,nrow=2))
plot(plasma_1)
I find that data points corresponding to rownames 17 and 23 are
outliers and high leverage.
I would then like to perform a fit without these two rows.
In principle this should be easy, using an update() with subset=-c(17,23).
The problem is that the rownames in this dataset are not ordered,
and, in fact, the relevant rows are 30 and 31, not 17 and 23.
This brings up the following (elementary?) questions:
1. How do you reference rows in "subset=" for which you know the
rownames, but not the row numbers?
2. How do you discovery the rows corresponding to particular
rownames? (Using plasma[rownames(plasma)==17,] shows the data, but
NOT the row number!) (Probably the same answer as in Q. 1 above.)
3. How do you sort (order) the rows of an existing data frame so that
the rownames are in order?
I don't seem to know the magic words to find the answers to these
questions in the help systems.
Obviously this can be done by writing new, brute force, functions
scanning the subscripts, but there must be an (obvious?) direct way
of doing this more elegantly.
Thanks for any pointers.
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
More information about the R-help
mailing list