[R] Eliminate cases in a subset of a dataframe
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Sep 14 18:55:22 CEST 2009
Hi Holger,
On Sep 14, 2009, at 10:57 AM, Hollix wrote:
>
> Hi folks,
>
> I created a subset of a dataframe (i.e., selected only men):
>
> subdata <- subset(data,data$gender==1)
>
> After a residual diagnostic of a regression analysis, I detected three
> outliers:
>
> linmod <- lm(y ~ x, data=subdata)
> plot(linmod)
>
> Say, the cases 11,22, and 33 were outliers.
>
> Here comes the problem: When I want to exclude these three cases in a
> further regression analysis,
> - for instance with linmod2 <- lm(y[-c(11,22,33)] ~ x[-c(11,22,33)],
> data=subdata) - it does not work.
I suspect that your x matrix is probably a 2d matrix, so you might
need to do:
R> lm(y[-c(11,22,33)] ~ x[-c(11,22,33),]
Note the trailing comma after the -c() vector when indexing into x!
Perhaps you can just remove those rows from your data and keep your
formula "clean", like so?
R> linmod2 <- lm(y ~ x, data=subdata[-c(11,22,33),])
> I guess this has something to do with this strange "row.names"-
> vector which
> has been added to the dataframe when creating the subset. I find it
> very
> strange why R gives the case numbers in the diagnostics but then
> doesn't
> allow me to use these numbers for further exclusion.
Hmm .. not sure what you mean, but this won't get in your way either
way if you are using integers to index into your data.frame.
> Can anybody tell me:
> 1. what this row.names vector is
> 2. How I can refer to cases after creating a subset (e.g., in order to
> exclude them).
Refer to them by their position in the data.frame as you would if you
didn't create a subset.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list