[R] spss, string factors, selecting

James Reilly reilly at stat.auckland.ac.nz
Tue Nov 27 09:52:47 CET 2007


It does sound like there could be a problem with the merging process. I 
have two questions about your merge command:
chaffmerge2<-merge(chaff, chafffat, by.x=c("RINGNO", "FAT",
"FATMTD"), by.y=c("RINGNO", "FAT", "FATMTD"), all=T)

1. What is the reason for matching on "FAT" and "FATMTD"? From your 
description of the data, I assume that "RINGNO" is the individual 
identifier. I'd have thought matching on that alone would be appropriate.

2. What happens if you omit the "all=T" argument? In particular, how 
does the size of the merged dataset compare to the inputs?

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

On 27/11/07 2:31 PM, Katherine Jones wrote:
> Hi,
> 
> This is probably a case where someone has to see what is happening on  
> my computer and it is complicated by my data being from SPSS (not my  
> choice). It is quite hard to give my data, because it is such a large  
> dataset. I have analysed 9 other datasets that work fine, but this  
> particular dataset was inputted wrong so requires merging of two  
> datasets. This may be the problem.
> 
> Example of data:-
> File 1.
> [1] Individual [2] Habitat type [3] Weight
> File 2.
> [1] Individual [2] Fat [3] Fat method.
> 
> I merge the two files to create:-
> [1] Individual [2] Habitat type [3] Weight [4] Fat [5] Fat method
> 
> My merging appears to work in the sense that I can plot Weight versus  
> Fat and I get data, but if I ask to see the data file I see a sea of  
> "NAs". So I'm not sure how there can be data there to plot, see  
> levels for and create tables for but I can't see it as a dataframe. I  
> do get the plot I want.
> 
> Fat method contains either blank cells, " B" or " E".
> 
> I wish to select all the rows in columns 1-4 which contain an " E" in  
> Fat method.
> 
> e.g.
> 120, 3, 20.2, 4, E
> 121, 4, 20.0, 5, B
> 132, 3, 21.2, 4,
> 
> I want to select only the row containing " E", so I can plot Fat vs  
> Habitat and Weight vs. Fat.
> 
> I have been doing this by using
> 
> selectE<-Data[Fatmethod==" E",].
> 
> However, this does not work. It removes all of my data in the other  
> columns to "NA" and I am left only with fatmethod and fat scores.
> 
> It is odd it works with other datasets but not this one. Although  
> with my other datasets when I ask to select " E", I can still see "  
> B" using levels(Fat method) but there is no data there, so my plots  
> are correct.
> 
> Sorry this is long. I'm having difficulty explaining it.
> 
> Katherine
> 
> 
> On 26-Nov-07, at 5:09 PM, jim holtman wrote:
> 
>> That should give you back a subset of 'data' (with all its columns),
>> for those with " E" in 'column'.  Can you show an example of your data
>> and what the desired output would be.  The posting guide asks "provide
>> commented, minimal, self-contained, reproducible code" so we don't
>> have to speculate on what you want.
>>
>> On Nov 26, 2007 5:04 PM, Katherine Jones  
>> <kajones at connect.carleton.ca> wrote:
>>> This sort of works. It does select the E data, but unfortunately  
>>> it doesn't
>>> select the data from the other columns; I want to select data  
>>> across about 5
>>> columns by the factor " E" in one of the columns. It should be  
>>> easy, but for
>>> some reason it is not working. The spaces being added don't help.
>>>
>>> It seems to work on my non-merged data files, although the merged  
>>> file
>>> contains all the data I need.
>>>
>>> Thanks for the subset command though. Hadn't thought of using that.
>>>
>>>
>>>
>>> On 26-Nov-07, at 4:46 PM, jim holtman wrote:
>>> ?subset
>>>
>>>
>>> subset(data, column == " E")
>>>
>>
>>
>> -- 
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list