[R] Sorting Data?

Jonathan Baron baron at psych.upenn.edu
Fri May 28 15:18:58 CEST 2004


On 05/28/04 12:21, Martin Klaffenboeck wrote:
>Am 27.05.2004 22:43:02 schrieb(en) Jonathan Baron:
>> On 05/27/04 21:58, Martin Klaffenboeck wrote:
>> >Hello,
>> >
>> >Im reading through some manuals, but I cannot find my answer.
>> >
>> >I have a file containing many data:
>> >
>> >Vpn	Code	Family	Age	F1	F2	...	F17
>> >1	1	M	46	1	2	...	1
>> >2	1	D	18	3	2	...	4
>> >3	2	M	50	3	3	...	3
>> >...
>> >and so on.
>> >
>> >Now I can read it by:
>> >
>> >F = read.table("file", header=T)
>> >
>> >but now I want to seperate the mothers (M) and daugthers (D) of the
>> >family with all the data in all other fields.  How can I do that?
>> >
>> >The 'Code' Tells me which mother belongs to which dougther.  I want
>> to
>> >make a matrix where I have the mothers on one and the daugthers on
>> the
>> >other axis and compair the distance of every question (F1...F17) and
>> >the distance of the sum of this questions.  The questions are
>> semantic
>> >differencials, 5 values.  F4, and F7 must have reverse polarity in
>> this
>> >case.
>>
>> The following is not tested and probably contains at least one error.
>
>Thanks, that helps me much as I am a R newbie.
>
>> Lets assume that there is one mother per daughter and one
>> daughter per mother, and your file is Myfile, and the Codes are
>> in order.  One way is this:
>
>Ok, we really have only one daughter per mother in our sample.
>Im sorting by:
>
>Myfile <- read.table("Fragebogen.data", header=TRUE)
>Myfile <- Myfile[order(e[, 'Code'], Myfile[, 'Family']), ]
>
>Code has one equal code for mother and daugther the same - so I know
>which mother has which daughter, Family tells me if the person it she
>mother or the daugther.
>
>> Myfile$F4 <- -Myfile$F4 # reverse polarity
>> Myfile$F7 <- -Myfile$F7
>
>Is this also true, if we have a semantic differential with 5 steps?
>(from 1 to five.  I have one missing value (NA), should I set it to 0?)
>(please tell me also if I use incorrect words).
>You didn't know that I assume.  So now I'm doing:
>
>Myfile$F3 <- 5-Myfile$F3
>
>that seems to be good for me, please tell me what you think.
>
>> Mothers <- Myfile[Family="M",]
>> Daughters <- Myfile[Family="D",]
>
>Hm.  This seems not to work for me i was testing arround, for me seems
>to work:
>
>Mothers <- Myfile[Myfile[["Family"]]=="M",]
>Daugthers <- ...

My mistake here was to use = instead of ==.  If you use my method
with ==, it might work too.

>I hope we have the same results now. ;-)  Im really a newbie in R.
>
>> Itemdiffs <- Mothers[,-(1:4)]-Daughters[,-(1:4)] # the -(1:4)
>>                                                  # removes cols 1:4
>
>Ok, this seems to work, also but I don't really know what I am doing
>with it.  

Type the names of the variables to find out what you are doing.
If they are too big, then subset them, for example, 

Mothers[1:5,]
Daughters[1:5,]
Itemdiffs[1:5,]

>Also the other things.
>
>I have to test the hypothesis:  Does a daugther answer the questions
>(semantic differential) more equal the own mother and more different to
>the mothers of the other daugthers.  

I don't think this is a trivial problem at all, so I am hesitant
to offer advice.  I see now that you reall do want a matrix,
where you have mothers in the columns and daughters in the rows,
and distance (difference, similarity) measures in the cells.
Perhaps you have several such measures, so you want a
three-dimensional array.

You might do something like this.  First define a function to
measure your distance, like

Itemdist <- function(x,y) {sum(abs(x[-(1:4)]-y[-(1:4)]))}

Dists <- matrix(NA,nrow(Daughters),nrow(Mothers))

for (i in 1:nrow(Daughters) 
 {for (j in 1:nrow(Mothers))
  {Dists[i,j] <- Itemdist(Daughters[i,],Mothers[j,])}}

Then you would want to show that the diagonal of the resulting
matrix is higher (or lower) than the other cells.  Here is where
I yield to experts.  If I were doing it, I might consider
comparing these cells to some measure of the expectation of what
they ought to be, but I would not just do a t test comparing them
to all the other cells because the other cells are not
independent of each other.  (One daughter might by odd and be
dissimilar from everyone, and this would show up in an entire
row.)  I'm sure there is some simple idea that I'm missing here.

>I hope you get that in
>english. ;-)

It seems that many people on the list read German (including me,
but I'm scared to write it), but the official language is
English.

>Thanks,
>Martin
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:            http://www.sas.upenn.edu/~baron
R search page:               http://finzi.psych.upenn.edu/




More information about the R-help mailing list