[R] Question about datatypes/plotting issue

Oscar Bonilla obonilla at galileo.edu
Wed Mar 11 17:48:48 CET 2009


David,

I struggled with this for a while. I think the problem with the dates  
I have is that they are not specific dates, they are "partial" dates.  
A workaround for that that I got from someone else in the list was:

as.Date(paste(x$Date, '1'), '%B %Y %d')

to make them specific dates (the first of the month).

Cheers,

-Oscar

On Mar 10, 2009, at 7:58 PM, David Winsemius wrote:

> You need to convert W$Date into a real date variable. At the moment  
> it is just a character variable.
>
> > str(W)
> 'data.frame':	265 obs. of  23 variables:
> $ Date            : Factor w/ 265 levels " ","April 1987",..: 1 90  
> 68 156 2 178 134 ...
> $ AZ.Phoenix      : Factor w/ 236 levels "","100.00","100.43",..:  
> 236 1 1 1 1 1 1 1 1 1 ...
> $ CA.Los.Angeles  : Factor w/ 260 levels "100.00","100.02",..: 260  
> 113 114 115 116 ...
> $ CA.San.Diego    : Factor w/ 261 levels "100.00","101.07",..: 261  
> 109 110 111 112 ...
> $ CA.San.Francisco: Factor w/ 256 levels "100.00","102.70",..: 256  
> 108 109 110 111 ...
> .(output trimmed)
> .
> .
> ?Date  # not the variable name, the R class name
> ?format.Date
> ?strptime
>
> Unfortunately I seem to be at one of the many limits to my knowledge:
> This code behaves in the manner I expected:
>
> > format(Sys.time(), "%a %b %d %X %Y %Z")
> [1] "Tue Mar 10 22:19:28 2009 EDT"
> > strptime(format(Sys.time(), "%a %b %d %X %Y %Z"), format="%a %b %d  
> %X %Y %Z")
> [1] "2009-03-10 22:20:04"
>
> Whereas this code does not:
> > format(Sys.Date(), "%B %Y")
> [1] "March 2009"
> > as.Date(format(Sys.Date(), "%B %Y"), "%B %Y")
> # would have assumed one was the inverse of the other, but ...
>
> [1] NA
>
> For some reason I cannot get the space delimited Month-YYYY combo to  
> convert. I can getother  space delimited formats to work for input  
> or output:
> > as.Date("03 1998", "%M %Y")
> [1] "1998-03-10"
>
> > format(Sys.Date(), "%B %Y")
> [1] "March 2009"
>
> Puzzled;
> -- 
> David Winsemius
>
> On Mar 10, 2009, at 9:15 PM, Oscar Bonilla wrote:
>
>> Hi,
>>
>> I am trying to plot the Case-Shiller index found at: http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls
>>
>> The way I'm importing it into R is as follows:
>>
>> 	library(gdata)
>> 	W <- read.xls("http://www2.standardandpoors.com/spf/pdf/index/CSHomePrice_History_022445.xls 
>> ", header=TRUE)
>> 	attach(W)
>>
>> To give you and idea of what the data looks like:
>>
>> > head(W)
>>          Date AZ.Phoenix CA.Los.Angeles CA.San.Diego CA.San.Francisco
>> 1                     PHXR           LXXR         SDXR              
>> SFXR
>> 2  January 1987                     59.33        54.67             
>> 46.61
>> 3 February 1987                     59.65        54.89             
>> 46.87
>> 4    March 1987                     59.99        55.16             
>> 47.32
>> 5    April 1987                     60.81        55.85             
>> 47.69
>> 6      May 1987                     61.67        56.35             
>> 48.31
>> CO.Denver DC.Washington FL.Miami FL.Tampa GA.Atlanta IL.Chicago  
>> MA.Boston
>> 1      DNXR          WDXR     MIXR     TPXR       ATXR        
>> CHXR      BOXR
>> 2     50.20         64.11    68.50    77.33                  
>> 53.55     70.04
>> 3     49.96         64.77    68.76    77.93                  
>> 54.64     70.08
>> 4     50.15         65.71    69.23    77.76                  
>> 54.80     70.00
>> 5     50.55         66.40    69.20    77.56                  
>> 54.88     70.70
>> 6     50.63         67.27    69.46    77.85                  
>> 55.43     71.51
>> MI.Detroit MN.Minneapolis NC.Charlotte NV.Las.Vegas NY.New.York  
>> OH.Cleveland
>> 1       DEXR           MNXR         CRXR         LVXR         
>> NYXR         CEXR
>> 2                                  63.39        66.36        
>> 74.42        53.53
>> 3                                  63.94        67.03        
>> 75.43        53.50
>> 4                                  64.17        67.34        
>> 76.25        53.68
>> 5                                  64.81        67.88        
>> 77.34        53.75
>> 6                                  65.18        67.90        
>> 79.16        54.71
>> OR.Portland TX.Dallas WA.Seattle Composite.10 Composite.20
>> 1        POXR      DAXR       SEXR         CSXR     SPCS20R
>> 2       41.05                             62.82
>> 3       41.28                             63.39
>> 4       41.06                             63.87
>> 5       40.96                             64.57
>> 6       41.24                             65.56
>>
>>
>> Now on to the problem... if I just run
>>
>> 	plot(CA.San.Francisco ~ Date)
>>
>> I get:
>> <pastedGraphic.png>
>>
>> Which I suspect is a problem because the Date column is not really  
>> a Date, it is a "factor"
>>
>> 	> class(Date)
>> 	[1] "factor"
>>
>> If I run:
>> 	plot(as.numeric(CA.San.Francisco), type="l")
>>
>> I get:
>>
>> <pastedGraphic.png>
>>
>>
>> which is wrong, as CA.San.Francisco has no such discontinuity.
>>
>> > CA.San.Francisco
>> [1] SFXR   46.61  46.87  47.32  47.69  48.31  48.83  49.49  49.94   
>> 50.69
>> [11] 51.33  51.80  52.03  52.24  52.64  53.19  54.19  56.09  58.22   
>> 58.70
>> [21] 59.00  59.50  60.37  61.31  62.20  62.66  63.32  64.64  66.27   
>> 67.77
>> [31] 69.26  70.27  71.36  72.31  72.95  73.25  73.02  72.87  72.95   
>> 73.50
>> [41] 74.57  75.12  75.15  74.81  74.45  74.24  73.44  72.58  71.47   
>> 71.17
>> [51] 70.27  69.56  69.46  70.13  70.83  71.39  71.52  71.55  71.21   
>> 70.69
>> [61] 70.05  69.67  69.48  69.17  69.26  69.86  70.02  70.00  69.64   
>> 69.51
>> [71] 69.28  68.85  68.21  67.77  67.44  67.09  67.59  67.90  67.99   
>> 67.65
>> [81] 67.63  67.50  67.18  66.77  66.27  65.98  65.79  66.37  67.05   
>> 67.70
>> [91] 68.15  68.38  68.40  68.21  68.17  68.04  67.93  67.73  67.40   
>> 66.79
>> [101] 67.08  67.31  67.50  67.72  67.78  67.76  67.30  66.80   
>> 66.43  66.15
>> [111] 65.97  65.92  66.44  67.05  67.67  68.02  68.35  68.43   
>> 68.53  68.72
>> [121] 68.69  68.80  68.81  69.78  71.09  72.19  73.12  73.75   
>> 74.43  74.76
>> [131] 75.22  75.31  75.81  76.19  76.53  77.48  79.08  80.82   
>> 82.41  83.52
>> [141] 84.41  85.06  85.05  84.66  84.50  85.03  85.93  87.51   
>> 89.21  90.82
>> [151] 92.52  94.20  95.14  96.15  96.72  97.87  98.90  100.00  
>> 102.70 106.56
>> [161] 110.97 115.01 118.45 119.48 119.95 120.94 123.08 125.66  
>> 128.58 131.16
>> [171] 133.27 134.10 134.38 134.09 132.64 130.95 129.15 128.60  
>> 128.01 126.99
>> [181] 125.47 125.13 126.06 128.79 132.62 136.07 139.35 141.02  
>> 141.93 142.29
>> [191] 142.74 143.06 142.40 141.90 142.19 143.00 144.69 145.53  
>> 146.53 147.75
>> [201] 148.72 150.25 151.75 153.36 154.62 155.93 158.11 160.90  
>> 164.65 167.76
>> [211] 171.51 173.85 175.89 178.15 180.75 183.15 185.72 189.35  
>> 193.50 198.30
>> [221] 201.86 205.52 208.92 211.56 212.86 214.73 215.55 215.70  
>> 215.11 214.78
>> [231] 215.50 216.04 217.52 218.37 218.12 217.63 217.22 216.37  
>> 215.42 213.84
>> [241] 212.13 211.78 210.78 211.09 211.47 210.89 209.48 208.64  
>> 208.15 206.46
>> [251] 202.03 195.49 189.23 183.81 174.54 168.38 164.63 162.70  
>> 159.83 156.88
>> [261] 151.42 145.53 139.44 135.28 130.12
>> 256 Levels: 100.00 102.70 106.56 110.97 115.01 118.45 119.48  
>> 119.95 ... SFXR
>>
>> However, as.numeric(CA.San.Francisco) does have it!
>>
>> > as.numeric(CA.San.Francisco)
>> [1] 256 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122  
>> 123 124
>> [19] 125 126 127 128 129 130 131 132 133 134 140 164 185 199 205  
>> 211 214 217
>> [37] 215 213 214 219 224 227 228 226 223 221 218 212 207 203 199  
>> 190 187 198
>> [55] 201 206 208 209 204 200 197 192 188 184 185 194 196 195 191  
>> 189 186 183
>> [73] 173 164 154 149 156 166 168 158 157 155 150 144 140 138 135  
>> 141 147 160
>> [91] 171 175 176 173 172 170 167 162 153 145 148 152 155 161 165  
>> 163 151 146
>> [109] 142 139 137 136 143 147 159 169 174 177 178 180 179 181 182  
>> 193 202 210
>> [127] 216 220 222 225 229 230 231 232 233 234 235 236 237 238 239  
>> 244 243 241
>> [145] 240 242 245 246 247 248 249 250 251 252 253 254 255   1   2    
>> 3   4   5
>> [163]   6   7   8   9  10  13  17  23  26  28  29  27  25  22  20   
>> 18  16  15
>> [181]  12  11  14  19  24  31  32  34  36  38  40  42  39  35  37   
>> 41  43  44
>> [199]  45  46  47  48  50  51  52  53  55  57  60  61  63  64  66   
>> 67  68  69
>> [217]  71  73  74  76  77  79  83  89  92  94  99 100  96  95  98  
>> 101 104 107
>> [235] 106 105 103 102  97  93  91  90  85  87  88  86  84  82  81   
>> 80  78  75
>> [253]  72  70  65  62  59  58  56  54  49  44  33  30  21
>>
>> What I'd like to get, is a graph like this (the red line):
>>
>> <pastedGraphic.png>
>>
>> I'm really puzzled about what's going on here. Any help would be  
>> greatly appreciated.
>>
>> Thanks,
>>
>> -Oscar
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT




More information about the R-help mailing list