[R] How to reference or sort rownames in a data frame

Gabor Grothendieck ggrothendieck at gmail.com
Mon May 28 04:29:09 CEST 2007


On 5/27/07, Robert A. LaBudde <ral at lcfltd.com> wrote:
> As I was working through elementary examples, I was using dataset
> "plasma" of package "HSAUR".
>
> In performing a logistic regression of the data, and making the
> diagnostic plots (R-2.5.0)
>
> data(plasma,package='HSAUR')
> plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
> layout(matrix(1:4,nrow=2))
> plot(plasma_1)
>
> I find that data points corresponding to rownames 17 and 23 are
> outliers and high leverage.
>
> I would then like to perform a fit without these two rows.
>
> In principle this should be easy, using an update() with subset=-c(17,23).
>
> The problem is that the rownames in this dataset are not ordered,
> and, in fact, the relevant rows are 30 and 31, not 17 and 23.
>
> This brings up the following (elementary?) questions:
>
> 1. How do you reference rows in "subset=" for which you know the
> rownames, but not the row numbers?

Use a logical vector:

   rownames(plasma) %in% c(17, 23)

>
> 2. How do you discovery the rows corresponding to particular
> rownames? (Using plasma[rownames(plasma)==17,] shows the data, but
> NOT the row number!) (Probably the same answer as in Q. 1 above.)

  which(rownames(plasma) %in% c(17, 23)) # 30, 31

>
> 3. How do you sort (order) the rows of an existing data frame so that
> the rownames are in order?


  plasma[order(as.numeric(rownames(plasma))), ]



More information about the R-help mailing list