[R] "Denormalize" data

RobinLovelace rob00x at hotmail.com
Thu Aug 11 12:40:32 CEST 2011


Hi Jeff, yes I can confirm that the xtabs solution does not work with the
original method.
Here's how it went:

HHum02 <- Hum02[1:30,] # select subset for demonstration purposes
> HHum02[1:5,]
         CASW Btype   Yr CO2Group NumVeh
170597 00CCFA  CARS 2002        C      2
170598 00CCFA  CARS 2002        D      2
170599 00CCFA  CARS 2002        E     22
170600 00CCFA  CARS 2002        F     32
170601 00CCFA  CARS 2002        G     32

HHH <- xtabs(NumVeh~CASW+CO2Group, data = HHum02)

Result:
...
...
38UFHG  0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  38UFHH  0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
 [ reached getOption("max.print") -- omitted 1303 rows ]]


As I said I thought this was due to R remembering the invisible rows that
were supposed to be removed during the HHum02 <- Hum02[1:30,] command.
However I found the only 0 outputs with the full dataset:

> head(xtabs(NumVeh~CASW+CO2Group, data = Hum02))
        CO2Group
CASW        A: Up to 100  B  C  D  E  F  G  H  I  J  K  K(L)  K(M) non-cars
         0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  00AAFA 0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  00AAFE 0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  00AAFQ 0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  00AAFS 0             0  0  0  0  0  0  0  0  0  0  0     0     0        0
  00AAFT 0             0  0  0  0  0  0  0  0  0  0  0     0     0        0


Interestingly, melt and cast do not suffer from this problem:

mdata <- melt(HHum02)
cast(mdata, CASW~CO2Group~variable, sum)

Result:

CASW      C  D  E  F  G  H  I  J  K  K(L)  K(M) non-cars
  00CCFA  2  2 22 32 32 12 12  9  9     8     2       66
  00CCFB  0  1 11 12 20 17  4 10  5     2     1       35

When you do the xtabs option on an edited dataset it does work:
head(HHum.alt)
     cas btyp    y co2 numv
1 00CCFA CARS 2002   C    2
2 00CCFA CARS 2002   D    2
3 00CCFA CARS 2002   E   22
4 00CCFA CARS 2002   F   32
5 00CCFA CARS 2002   G   32
6 00CCFA CARS 2002   H   12

> head(xtabs( numv ~ cas+co2, data= HHum.alt)) 
        co2
cas       C  D  E  F  G  H  I  J  K  K(L)  K(M) non-cars
  00CCFA  2  2 22 32 32 12 12  9  9     8     2       66
  00CCFB  0  1 11 12 20 17  4 10  5     2     1       35


So all-in all it looks like xtabs does work, but that it gets put-off by the
superfluous column beginning 170597  in Hum02.
That raises a couple of questions:
(1) how do I remove the superfluous column (sorry for asking noobie question
- I did look, promise!)?
(2) why can melt-cast deal with this superfluous column while xtabs cannot?

My conclusion is that melt-cast is more robust. 

All data and history should be available here
http://dl.dropbox.com/u/15008199/Work-Rgeo1.zip
 if anyone wants to replicate this. Many thanks, 
Robin





 


--
View this message in context: http://r.789695.n4.nabble.com/Denormalize-data-tp3729817p3735488.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list