[R] spss.read factor reversal
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Jul 27 11:01:19 CEST 2005
Adaikalavan Ramasamy <ramasamy at cancer.org.uk> writes:
> I think it is doing what is supposed to do but I never used read.spss,
> so take this with a pinch of salt.
>
> In R when you use as.integer on a factor, the one with the lowest level
> gets a value of 1 and so on. The lowest level of the factor can
> determined from levels() function.
>
> f <- factor( c("Green", "Green", "Red", "Blue"),
> levels=c("Red", "Blue", "Green") )
> levels(f)
> [1] "Red" "Blue" "Green"
>
> as.integer(f)
> [1] 3 3 1 2
>
> But the levels of a factor can be changed
>
> as.integer( factor( f, levels=c("Green", "Blue", "Red" ) ) )
> [1] 1 1 3 2
Doesn't explain why 1 2 3 in the input file comes out as Green Blue
Red, does it?
> You can also try setting use.value.labels=FALSE in read.spss function
> and then creating a factor out of it.
Would be interesting to see this. I would suspect that the damage is
already done at that point though.
I notice that the value labels are in reverse order. Shouldn't matter
to read.spss which has
rval[[nm]] <- factor(rval[[nm]], levels = vl[[v]],
labels = trim(names(vl[[v]])))
i.e. levels and labels should be in the correct order.
But something is odd, you'd expect the following effect:
> x <- 1:3
> factor(x,levels=3:1,labels=c("G","B","R"))
[1] R B G
Levels: G B R
> as.integer(factor(x,levels=3:1,labels=c("G","B","R")))
[1] 3 2 1
but Joel's output has the levels in the order R B G, which contradicts
the
attr(,"label.table")$COLOR
BTW, this is R 2.1.1, I hope Joel isn't wasting our time by using an
older version...
-p
> Regards, Adai
>
>
>
> On Tue, 2005-07-26 at 17:04 -0700, Joel Bremson wrote:
> > Hi,
> >
> > I'm having a problem with spss.read reversing my factor input.
> >
> > Here is the input copied from the spss data editor:
> >
> > color cost
> > 1 2.30
> > 2 2.40
> > 3 3.00
> > 1 2.10
> > 1 1.00
> > 1 2.00
> > 2 4.00
> > 2 3.20
> > 2 2.33
> > 3 2.44
> > 3 2.55
> >
> > For color, red=1, blue=2, and green = 3. It's type is 'String' and
> >
> > >out=read.spss(file)
> > >out
> >
> > $COLOR
> > [1] green blue red green green green blue blue blue red red
> > Levels: red blue green
> >
> > $COST
> > [1] 2.30 2.40 3.00 2.10 1.00 2.00 4.00 3.20 2.33 2.44 2.55
> >
> > attr(,"label.table")
> > attr(,"label.table")$COLOR
> > green blue red
> > 3 2 1
> >
> > attr(,"label.table")$COST
> > NULL
> >
> > attr(,"variable.labels")
> > COLOR COST
> > "color" "cost"
> >
> > =====EOF===================
> >
> > Notice that the $COLOR factor data are inverted, looking at the integer
> > output
> > we see:
> >
> > > as.integer(out$COLOR)
> > [1] 3 2 1 3 3 3 2 2 2 1 1
> >
> > The spss original data looks like this:
> > 1 2 3 1 1 1 2 2 2 3 3
> >
> > I can easily invert the output mathematically with:
> > q = sapply(m,function(x){ x + 2*(median(unique(m))-x)})
> >
> > (m is composed of sequential integers starting at one)
> >
> > ,but it seems as though something wrong is happening with read.spss.
> >
> > Any ideas?
> >
> > Joel Bremson
> > Graduate Student
> > UC Davis
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list