[R] spss.read factor reversal

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Jul 27 11:01:19 CEST 2005


Adaikalavan Ramasamy <ramasamy at cancer.org.uk> writes:

> I think it is doing what is supposed to do but I never used read.spss,
> so take this with a pinch of salt.
> 
> In R when you use as.integer on a factor, the one with the lowest level
> gets a value of 1 and so on. The lowest level of the factor can
> determined from levels() function.
> 
>    f <- factor( c("Green", "Green", "Red", "Blue"), 
>                 levels=c("Red", "Blue", "Green") )
>    levels(f)
>    [1] "Red"   "Blue"  "Green"
> 
>    as.integer(f)
>    [1] 3 3 1 2
> 
> But the levels of a factor can be changed 
> 
>    as.integer( factor( f, levels=c("Green", "Blue", "Red" ) ) )
>    [1] 1 1 3 2

Doesn't explain why  1 2 3 in the input file comes out as Green Blue
Red, does it?
 
> You can also try setting use.value.labels=FALSE in read.spss function
> and then creating a factor out of it.

Would be interesting to see this. I would suspect that the damage is
already done at that point though.

I notice that the value labels are in reverse order. Shouldn't matter
to read.spss which has

            rval[[nm]] <- factor(rval[[nm]], levels = vl[[v]],
                labels = trim(names(vl[[v]])))

i.e. levels and labels should be in the correct order. 

But something is odd, you'd expect the following effect:

> x <- 1:3
> factor(x,levels=3:1,labels=c("G","B","R"))
[1] R B G
Levels: G B R
> as.integer(factor(x,levels=3:1,labels=c("G","B","R")))
[1] 3 2 1

but Joel's output has the levels in the order R B G, which contradicts
the 

attr(,"label.table")$COLOR

BTW, this is R 2.1.1, I hope Joel isn't wasting our time by using an
older version...

        -p


> Regards, Adai
> 
> 
> 
> On Tue, 2005-07-26 at 17:04 -0700, Joel Bremson wrote:
> > Hi,
> > 
> > I'm having a problem with spss.read reversing my factor input.
> > 
> > Here is the input copied from the spss data editor:
> > 
> > color cost
> > 1 2.30
> > 2 2.40
> > 3 3.00
> > 1 2.10
> > 1 1.00
> > 1 2.00
> > 2 4.00
> > 2 3.20
> > 2 2.33
> > 3 2.44
> > 3 2.55
> > 
> > For color, red=1, blue=2, and green = 3. It's type is 'String' and
> > 
> > >out=read.spss(file)
> > >out
> > 
> > $COLOR
> > [1] green blue red green green green blue blue blue red red 
> > Levels: red blue green
> > 
> > $COST
> > [1] 2.30 2.40 3.00 2.10 1.00 2.00 4.00 3.20 2.33 2.44 2.55
> > 
> > attr(,"label.table")
> > attr(,"label.table")$COLOR
> > green blue red 
> > 3 2 1 
> > 
> > attr(,"label.table")$COST
> > NULL
> > 
> > attr(,"variable.labels")
> > COLOR COST 
> > "color" "cost" 
> > 
> > =====EOF===================
> > 
> > Notice that the $COLOR factor data are inverted, looking at the integer 
> > output
> > we see:
> > 
> > > as.integer(out$COLOR)
> > [1] 3 2 1 3 3 3 2 2 2 1 1
> > 
> > The spss original data looks like this:
> > 1 2 3 1 1 1 2 2 2 3 3
> > 
> > I can easily invert the output mathematically with:
> > q = sapply(m,function(x){ x + 2*(median(unique(m))-x)})
> > 
> > (m is composed of sequential integers starting at one)
> > 
> > ,but it seems as though something wrong is happening with read.spss.
> > 
> > Any ideas?
> > 
> > Joel Bremson
> > Graduate Student
> > UC Davis
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list