[R] Question about datatypes/plotting issue
David Winsemius
dwinsemius at comcast.net
Wed Mar 11 03:58:41 CET 2009
You need to convert W$Date into a real date variable. At the moment it
is just a character variable.
> str(W)
'data.frame': 265 obs. of 23 variables:
$ Date : Factor w/ 265 levels " ","April 1987",..: 1 90
68 156 2 178 134 ...
$ AZ.Phoenix : Factor w/ 236 levels "","100.00","100.43",..:
236 1 1 1 1 1 1 1 1 1 ...
$ CA.Los.Angeles : Factor w/ 260 levels "100.00","100.02",..: 260
113 114 115 116 ...
$ CA.San.Diego : Factor w/ 261 levels "100.00","101.07",..: 261
109 110 111 112 ...
$ CA.San.Francisco: Factor w/ 256 levels "100.00","102.70",..: 256
108 109 110 111 ...
.(output trimmed)
.
.
?Date # not the variable name, the R class name
?format.Date
?strptime
Unfortunately I seem to be at one of the many limits to my knowledge:
This code behaves in the manner I expected:
> format(Sys.time(), "%a %b %d %X %Y %Z")
[1] "Tue Mar 10 22:19:28 2009 EDT"
> strptime(format(Sys.time(), "%a %b %d %X %Y %Z"), format="%a %b %d
%X %Y %Z")
[1] "2009-03-10 22:20:04"
Whereas this code does not:
> format(Sys.Date(), "%B %Y")
[1] "March 2009"
> as.Date(format(Sys.Date(), "%B %Y"), "%B %Y")
# would have assumed one was the inverse of the other, but ...
[1] NA
For some reason I cannot get the space delimited Month-YYYY combo to
convert. I can getother space delimited formats to work for input or
output:
> as.Date("03 1998", "%M %Y")
[1] "1998-03-10"
> format(Sys.Date(), "%B %Y")
[1] "March 2009"
Puzzled;
--
David Winsemius
On Mar 10, 2009, at 9:15 PM, Oscar Bonilla wrote:
> Hi,
>
> I am trying to plot the Case-Shiller index found at: http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls
>
> The way I'm importing it into R is as follows:
>
> library(gdata)
> W <- read.xls("http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls
> ", header=TRUE)
> attach(W)
>
> To give you and idea of what the data looks like:
>
> > head(W)
> Date AZ.Phoenix CA.Los.Angeles CA.San.Diego CA.San.Francisco
> 1 PHXR LXXR SDXR
> SFXR
> 2 January 1987 59.33 54.67
> 46.61
> 3 February 1987 59.65 54.89
> 46.87
> 4 March 1987 59.99 55.16
> 47.32
> 5 April 1987 60.81 55.85
> 47.69
> 6 May 1987 61.67 56.35
> 48.31
> CO.Denver DC.Washington FL.Miami FL.Tampa GA.Atlanta IL.Chicago
> MA.Boston
> 1 DNXR WDXR MIXR TPXR ATXR
> CHXR BOXR
> 2 50.20 64.11 68.50 77.33
> 53.55 70.04
> 3 49.96 64.77 68.76 77.93
> 54.64 70.08
> 4 50.15 65.71 69.23 77.76
> 54.80 70.00
> 5 50.55 66.40 69.20 77.56
> 54.88 70.70
> 6 50.63 67.27 69.46 77.85
> 55.43 71.51
> MI.Detroit MN.Minneapolis NC.Charlotte NV.Las.Vegas NY.New.York
> OH.Cleveland
> 1 DEXR MNXR CRXR LVXR
> NYXR CEXR
> 2 63.39 66.36
> 74.42 53.53
> 3 63.94 67.03
> 75.43 53.50
> 4 64.17 67.34
> 76.25 53.68
> 5 64.81 67.88
> 77.34 53.75
> 6 65.18 67.90
> 79.16 54.71
> OR.Portland TX.Dallas WA.Seattle Composite.10 Composite.20
> 1 POXR DAXR SEXR CSXR SPCS20R
> 2 41.05 62.82
> 3 41.28 63.39
> 4 41.06 63.87
> 5 40.96 64.57
> 6 41.24 65.56
>
>
> Now on to the problem... if I just run
>
> plot(CA.San.Francisco ~ Date)
>
> I get:
> <pastedGraphic.png>
>
> Which I suspect is a problem because the Date column is not really a
> Date, it is a "factor"
>
> > class(Date)
> [1] "factor"
>
> If I run:
> plot(as.numeric(CA.San.Francisco), type="l")
>
> I get:
>
> <pastedGraphic.png>
>
>
> which is wrong, as CA.San.Francisco has no such discontinuity.
>
> > CA.San.Francisco
> [1] SFXR 46.61 46.87 47.32 47.69 48.31 48.83 49.49 49.94
> 50.69
> [11] 51.33 51.80 52.03 52.24 52.64 53.19 54.19 56.09 58.22
> 58.70
> [21] 59.00 59.50 60.37 61.31 62.20 62.66 63.32 64.64 66.27
> 67.77
> [31] 69.26 70.27 71.36 72.31 72.95 73.25 73.02 72.87 72.95
> 73.50
> [41] 74.57 75.12 75.15 74.81 74.45 74.24 73.44 72.58 71.47
> 71.17
> [51] 70.27 69.56 69.46 70.13 70.83 71.39 71.52 71.55 71.21
> 70.69
> [61] 70.05 69.67 69.48 69.17 69.26 69.86 70.02 70.00 69.64
> 69.51
> [71] 69.28 68.85 68.21 67.77 67.44 67.09 67.59 67.90 67.99
> 67.65
> [81] 67.63 67.50 67.18 66.77 66.27 65.98 65.79 66.37 67.05
> 67.70
> [91] 68.15 68.38 68.40 68.21 68.17 68.04 67.93 67.73 67.40
> 66.79
> [101] 67.08 67.31 67.50 67.72 67.78 67.76 67.30 66.80 66.43
> 66.15
> [111] 65.97 65.92 66.44 67.05 67.67 68.02 68.35 68.43 68.53
> 68.72
> [121] 68.69 68.80 68.81 69.78 71.09 72.19 73.12 73.75 74.43
> 74.76
> [131] 75.22 75.31 75.81 76.19 76.53 77.48 79.08 80.82 82.41
> 83.52
> [141] 84.41 85.06 85.05 84.66 84.50 85.03 85.93 87.51 89.21
> 90.82
> [151] 92.52 94.20 95.14 96.15 96.72 97.87 98.90 100.00 102.70
> 106.56
> [161] 110.97 115.01 118.45 119.48 119.95 120.94 123.08 125.66 128.58
> 131.16
> [171] 133.27 134.10 134.38 134.09 132.64 130.95 129.15 128.60 128.01
> 126.99
> [181] 125.47 125.13 126.06 128.79 132.62 136.07 139.35 141.02 141.93
> 142.29
> [191] 142.74 143.06 142.40 141.90 142.19 143.00 144.69 145.53 146.53
> 147.75
> [201] 148.72 150.25 151.75 153.36 154.62 155.93 158.11 160.90 164.65
> 167.76
> [211] 171.51 173.85 175.89 178.15 180.75 183.15 185.72 189.35 193.50
> 198.30
> [221] 201.86 205.52 208.92 211.56 212.86 214.73 215.55 215.70 215.11
> 214.78
> [231] 215.50 216.04 217.52 218.37 218.12 217.63 217.22 216.37 215.42
> 213.84
> [241] 212.13 211.78 210.78 211.09 211.47 210.89 209.48 208.64 208.15
> 206.46
> [251] 202.03 195.49 189.23 183.81 174.54 168.38 164.63 162.70 159.83
> 156.88
> [261] 151.42 145.53 139.44 135.28 130.12
> 256 Levels: 100.00 102.70 106.56 110.97 115.01 118.45 119.48
> 119.95 ... SFXR
>
> However, as.numeric(CA.San.Francisco) does have it!
>
> > as.numeric(CA.San.Francisco)
> [1] 256 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
> 123 124
> [19] 125 126 127 128 129 130 131 132 133 134 140 164 185 199 205 211
> 214 217
> [37] 215 213 214 219 224 227 228 226 223 221 218 212 207 203 199 190
> 187 198
> [55] 201 206 208 209 204 200 197 192 188 184 185 194 196 195 191 189
> 186 183
> [73] 173 164 154 149 156 166 168 158 157 155 150 144 140 138 135 141
> 147 160
> [91] 171 175 176 173 172 170 167 162 153 145 148 152 155 161 165 163
> 151 146
> [109] 142 139 137 136 143 147 159 169 174 177 178 180 179 181 182
> 193 202 210
> [127] 216 220 222 225 229 230 231 232 233 234 235 236 237 238 239
> 244 243 241
> [145] 240 242 245 246 247 248 249 250 251 252 253 254 255 1 2
> 3 4 5
> [163] 6 7 8 9 10 13 17 23 26 28 29 27 25 22 20
> 18 16 15
> [181] 12 11 14 19 24 31 32 34 36 38 40 42 39 35 37
> 41 43 44
> [199] 45 46 47 48 50 51 52 53 55 57 60 61 63 64 66
> 67 68 69
> [217] 71 73 74 76 77 79 83 89 92 94 99 100 96 95 98
> 101 104 107
> [235] 106 105 103 102 97 93 91 90 85 87 88 86 84 82 81
> 80 78 75
> [253] 72 70 65 62 59 58 56 54 49 44 33 30 21
>
> What I'd like to get, is a graph like this (the red line):
>
> <pastedGraphic.png>
>
> I'm really puzzled about what's going on here. Any help would be
> greatly appreciated.
>
> Thanks,
>
> -Oscar
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list