[R] Deleting certain observations (and their imprint?)
Sarah Goslee
sarah.goslee at gmail.com
Thu Nov 29 17:50:57 CET 2012
Hi Kirk,
It's because tension is a factor with three levels, as you could see with
str(warpbreaks).
Factors are one of the mysteries of R that distinguish a novice from
an initiate.
Reading ?subset directs you to ?droplevels. Here's an example:
> summary(warpbreaks)
breaks wool tension
Min. :10.00 A:27 L:18
1st Qu.:18.25 B:27 M:18
Median :26.00 H:18
Mean :28.15
3rd Qu.:34.00
Max. :70.00
> str(warpbreaks)
'data.frame': 54 obs. of 3 variables:
$ breaks : num 26 30 54 25 70 52 51 26 67 18 ...
$ wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
$ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...
> ?subset
> wb.subset <- warpbreaks[which(warpbreaks$tension=="L"),]
> summary(wb.subset)
breaks wool tension
Min. :14.00 A:9 L:18
1st Qu.:26.00 B:9 M: 0
Median :29.50 H: 0
Mean :36.39
3rd Qu.:49.25
Max. :70.00
> wb.subset <- droplevels(wb.subset)
> summary(wb.subset)
breaks wool tension
Min. :14.00 A:9 L:18
1st Qu.:26.00 B:9
Median :29.50
Mean :36.39
3rd Qu.:49.25
Max. :70.00
>
Sarah
On Thu, Nov 29, 2012 at 11:32 AM, Stodola, Kirk <kstodola at illinois.edu> wrote:
> I'm manipulating a large dataset and need to eliminate some observations based on specific identifiers. This isn't a problem in and of itself (using which.. or subset..) but an imprint of the deleted observations seem to remain, even though they have 0 observations. This is causing me problems later on. I'll use the dataset warpbreaks to illustrate, I apologize if this isn't in the best format
>
> ##Summary of warpbreaks suggests three tension levels (H, M, L)
>> summary(warpbreaks)
>
> breaks wool tension
> Min. :10.00 A:27 L:18
> 1st Qu.:18.25 B:27 M:18
> Median :26.00 H:18
> Mean :28.15
> 3rd Qu.:34.00
> Max. :70.00
>
> ## Subset the dataset and keep only those observations with "L"
>> wb.subset <- warpbreaks[which(warpbreaks$tension=="L"),]
>
>
> ##Summary of the subsetted data shows: L=18, M=0, H=0, Why is M and H still included?
>> summary(wb.subset)
>
> breaks wool tension
> Min. :14.00 A:9 L:18
> 1st Qu.:26.00 B:9 M: 0
> Median :29.50 H: 0
> Mean :36.39
> 3rd Qu.:49.25
> Max. :70.00
>
> ##The subsetted dataset does not show M or H
>> wb.subset
>
> Is there a way that M & H can be completely eliminated (i.e. they don't show up in summary)? The only way I found was to export the dataset and then reimport, which seems pretty cumbersome. Thanks in advance for any help. -Kirk
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list