[R] Selecting groups with R

David Winsemius dwinsemius at comcast.net
Sat Aug 22 00:50:53 CEST 2009


On Aug 21, 2009, at 6:36 PM, Don McKenzie wrote:

> Right, but he just wanted to eliminate "BLUE" as far as I could see.

Read his message again. He already showed three methods all of which  
gave results identical to the one you offered. He asked to be shown  
why the 0's were appearing in table().

He thought that aspect was also the cause of his problems with t.test,  
although it wasn't. t.test was coercing his character vector, dataset 
$Color, to numeric NA's and then complaining about a lack of  
variability in the vector.

>
>
> On 21-Aug-09, at 3:33 PM, David Winsemius wrote:
>
>>
>> On Aug 21, 2009, at 6:16 PM, Don McKenzie wrote:
>>
>>> dataset[dataset$Color != "BLUE",]
>>
>> Will return a data.frame with Color still a factor with three levels.
>>
>>>
>>> On 21-Aug-09, at 3:08 PM, jlwoodard wrote:
>>>
>>>>
>>>> I have a data set similar to the following:
>>>>
>>>> Color  Score
>>>> RED      10
>>>> RED      13
>>>> RED      12
>>>> WHITE   22
>>>> WHITE   27
>>>> WHITE   25
>>>> BLUE     18
>>>> BLUE     17
>>>> BLUE     16
>>>>
>>>> and I am trying to to select just the values of Color that are  
>>>> equal to RED
>>>> or WHITE, excluding the BLUE.
>>>>
>>>> I've tried the following:
>>>> myComp1<-subset(dataset, Color =="RED" | Color == "WHITE")
>>>> myComp1<-subset(dataset, Color != "BLUE")
>>>> myComp1<-dataset[which(dataset$Color != "BLUE"),]
>>>>
>>>> Each of the above lines successfully excludes the BLUE subjects,  
>>>> but the
>>>> "BLUE" category is still present in my data set; that is, if I try
>>>> table(Color)  I get
>>>>
>>>> RED  WHITE  BLUE
>>>> 82     151      0
>>>>
>>>> If I try to do a t-test (since I've presumably gone from three  
>>>> groups to two
>>>> groups), I get:
>>>> Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx),  
>>>> abs(my)))
>>>> stop("data are essentially constant") :
>>>> missing value where TRUE/FALSE needed
>>>> In addition: Warning message:
>>>> In mean.default(y) : argument is not numeric or logical:  
>>>> returning NA
>>>>
>>>> and describe.by(score,Color) gives me descriptives for RED and  
>>>> WHITE, and
>>>> BLUE also shows up as NULL.
>>>>
>>>> How can I eliminate the BLUE category completely so I can do a t- 
>>>> test using
>>>> Color (with just the RED and WHITE subjects)?
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>
> Don McKenzie, Research Ecologist
> Pacific WIldland Fire Sciences Lab
> US Forest Service
>
> Affiliate Professor
> School of Forest Resources, College of the Environment
> CSES Climate Impacts Group
> University of Washington
>
> desk: 206-732-7824
> cell: 206-321-5966
> dmck at u.washington.edu
> donaldmckenzie at fs.fed.us
>
>
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list