[R] "Denormalize" data
RobinLovelace
rob00x at hotmail.com
Thu Aug 11 12:40:32 CEST 2011
Hi Jeff, yes I can confirm that the xtabs solution does not work with the
original method.
Here's how it went:
HHum02 <- Hum02[1:30,] # select subset for demonstration purposes
> HHum02[1:5,]
CASW Btype Yr CO2Group NumVeh
170597 00CCFA CARS 2002 C 2
170598 00CCFA CARS 2002 D 2
170599 00CCFA CARS 2002 E 22
170600 00CCFA CARS 2002 F 32
170601 00CCFA CARS 2002 G 32
HHH <- xtabs(NumVeh~CASW+CO2Group, data = HHum02)
Result:
...
...
38UFHG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38UFHH 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 1303 rows ]]
As I said I thought this was due to R remembering the invisible rows that
were supposed to be removed during the HHum02 <- Hum02[1:30,] command.
However I found the only 0 outputs with the full dataset:
> head(xtabs(NumVeh~CASW+CO2Group, data = Hum02))
CO2Group
CASW A: Up to 100 B C D E F G H I J K K(L) K(M) non-cars
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00AAFA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00AAFE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00AAFQ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00AAFS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00AAFT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Interestingly, melt and cast do not suffer from this problem:
mdata <- melt(HHum02)
cast(mdata, CASW~CO2Group~variable, sum)
Result:
CASW C D E F G H I J K K(L) K(M) non-cars
00CCFA 2 2 22 32 32 12 12 9 9 8 2 66
00CCFB 0 1 11 12 20 17 4 10 5 2 1 35
When you do the xtabs option on an edited dataset it does work:
head(HHum.alt)
cas btyp y co2 numv
1 00CCFA CARS 2002 C 2
2 00CCFA CARS 2002 D 2
3 00CCFA CARS 2002 E 22
4 00CCFA CARS 2002 F 32
5 00CCFA CARS 2002 G 32
6 00CCFA CARS 2002 H 12
> head(xtabs( numv ~ cas+co2, data= HHum.alt))
co2
cas C D E F G H I J K K(L) K(M) non-cars
00CCFA 2 2 22 32 32 12 12 9 9 8 2 66
00CCFB 0 1 11 12 20 17 4 10 5 2 1 35
So all-in all it looks like xtabs does work, but that it gets put-off by the
superfluous column beginning 170597 in Hum02.
That raises a couple of questions:
(1) how do I remove the superfluous column (sorry for asking noobie question
- I did look, promise!)?
(2) why can melt-cast deal with this superfluous column while xtabs cannot?
My conclusion is that melt-cast is more robust.
All data and history should be available here
http://dl.dropbox.com/u/15008199/Work-Rgeo1.zip
if anyone wants to replicate this. Many thanks,
Robin
--
View this message in context: http://r.789695.n4.nabble.com/Denormalize-data-tp3729817p3735488.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list