[R] Problem with strptime generating missing values where none appear to exist

Don MacQueen macq at llnl.gov
Tue Feb 23 22:25:16 CET 2010


What happens if you do all that NA checking on dob  *before* 
subtracting 100 from dob$year?

What happens if you use difftime() before subtracting the 100?

Do you get any NAs if you convert dob to POSIXct?

(these are just investigative ideas, obviously)

-Don

At 6:26 PM +0000 2/23/10, Jonathan Williams wrote:
>Dear R Helpers,
>
>I am having difficulty with strptime. I wish to find the differences between
>two vectors of times. I have apparently no difficulty to convert the vectors
>to the appropriate format using strptime. But, then difftime does not
>calculate all the differences.
>
>Here is the code and output:-
>
>dob=strptime(as.character(datx$BDT),'%d-%b-%y'); dob$year=dob$year-100
>sdate=strptime(as.character(datx$SDT),'%d-%b-%y')
>head(dob); head(sdate)
>[1] "1922-07-14" "1922-07-14" "1922-07-14" "1922-07-14" "1921-03-23"
>"1921-03-23"
>[1] "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-20"
>"2001-08-20"
>>  str(dob)
>  POSIXlt[1:9], format: "1922-07-14" "1922-07-14" "1922-07-14" "1922-07-14"
>"1921-03-23" "1921-03-23" "1921-03-23" "1927-08-27" "1927-08-27"
>"1927-08-27" "1927-08-27" "1940-04-05" "1940-04-05" "1940-04-05"
>"1940-04-05" ...
>>  str(sdate)
>  POSIXlt[1:9], format: "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-27"
>"2001-08-20" "2001-08-20" "2001-08-20" "2001-11-26" "2001-11-26"
>"2001-11-26" "2001-11-26" "2002-05-20" "2002-05-20" "2002-05-20"
>"2002-05-20" ...
>
>table(is.na(sdate))
>
>FALSE
>   812
>
>table(is.na(dob))
>
>FALSE  TRUE
>   743    69
>But if I now look at each component of dob separately, none is missing
>
>for (i in 1:length(dob)) {print(names(dob)[i]);
>print(table(is.na(dob[[i]])))}
>
>[1] "sec"
>
>FALSE
>   812
>[1] "min"
>
>FALSE
>   812
>[1] "hour"
>
>FALSE
>   812
>[1] "mday"
>
>FALSE
>   812
>[1] "mon"
>
>FALSE
>   812
>[1] "year"
>
>FALSE
>   812
>[1] "wday"
>
>FALSE
>   812
>[1] "yday"
>
>FALSE
>   812
>[1] "isdst"
>
>FALSE
>   812
>
>Additionally, there are no NA values in any component of dob on direct
>visual inspection. For example, here is dob$mon
>dob$mon
>[1]  6  6  6  6  2  2  2  7  7  7  7  3  3  3  3 11 11 11 11  7  7  7  7  7
>7  7  7  4  4  4  4  4  4  4  4 11 11 11 11  7  7  7  7 11 11 11 11  6  6  6
>6  9  9  9  9  3  3  3  3  6  6  6  6  8  8  8  8  7  7  7  7  4  4  4  4  4
>  [77]  4  4  4  7  7  7  7 10 10 11 11 11 11  8  8  8  8  0  0  0  0  0 10
>10 10  7  7  7  7  3  3  3  3  2  2  2  2  2  6  6  6  6  6  5  5  5  5  4
>4  4  4  4 11 11 11 11  4  4  4  4  4  3  3  3  3  3  7  7  7  7  7  7  7  7
>8  8
>[153]  8  8  8  8  8  8  7  7  7  7  6  6 10 10 10 10  4  4  4  4  4  4  4
>4 10 10 10 10 11 11 11 11  5  5  5  5  5  5  5  5  3  3  3  3  5  5  0  0  0
>0  2  2  2  2  6  6  6  6  0  0  0  0  3  3  3  3  6  6  6  6  8  8  8  8  8
>7
>[229]  7  7  7  7  7  7  7  7  8  8  8  8  4  4  4  4 10 10 10 10  2  2  2
>2  0  0  0  0  0  0  1  1  1  1  4  4  4  4  2  2  2  2  2  8  8  8  8 11 11
>11 11  8  8  8  8  4  4  4  4  5  5  5  5  8  8  8  8  0  0  0  0  1  1  1
>1  1
>[305]  1  1  1  4  4  4  4  5  5  5  5  7  7  7  7  5  5  5  5  3  3  3  3
>1  1  1  1  0  0  0  0  3  3  3  3  6  6  6  6  3  3  3  5  5 11 11 11  5  5
>5  5  0  0  0  0 10 10 10 10  4  4  4  4  6  6  6  6  7  7  7  7  4  4  4  4
>1
>[381]  1  1  1  7  7  7  7  3  3  3  3  7  7  7  7  5  5  5  5  9  9  9  9
>11 11 11 11 10 10 10 10  0  0  0  0  5  5  5  5  3  3  3  3  7  7  7  7  0
>0  0  0  6  6  6  6  8  8  8  8  8  8  8  8  3  3  3  3  5  5  5  5 10 10 10
>10  3
>[457]  3  3  3  8  8  8  8  0  0  0  0 11 11 11 11  2  2  2  2  7  7  7  7
>0  0  0  0  0  1  1  1  1  5  5  5  5  7  7  7  7  7  7  7  7  5  5  5  5  9
>9  9  9  5  5  5  5  6  6  6  6  8  8  8  8 11 11 11 11  3  3  3  3  6  6  6
>6
>[533]  3  3  3  3  6  6  6  6  8  8  8  8  9  9  9  9  2  2  2  2  1  1  1
>1  2  2  2  2  4  4  4  7  7  7  8  8  8  8  3  3  3  3  1  1  1  1  1  9  9
>9  9  8  8  8  8 11 11 11 11  6  6  6  6  3  3  3  3 10 10 10  8  8  8  0  0
>0
>[609]  0  3  3  3  3  3  0  0  0  0  3  3  3  3  5  5  5  5 10 10 10 10 10
>10 10 10  2  2  2  2  2  3  3  3  3  4  4  4 10 10 10 10  2  2  2  2  3  3
>3  3  2  2  2  2  2  2  6  6  6  6  4  4  4  4 11 11 11 11  0  0  0  0 11 11
>11 11
>[685]  5  5  5  5  8  8  8  8  8  8  8  8  7  7  7  7  3  3  3  3  5  5  5
>5 11 11 11 11  3  3  3  9  9  5  5  5  5  8  8  8  8  2  2  2  2  5  5  5  5
>2  2  2  2 10 10 10 10  4  4  4 11 11 11 11  8  8  8  9  9  9  1  1  1  1  8
>8
>[761]  8  8  2  2  2 11 11 11 11  2  2  2  2  2  2  2  2  6  6  6  6 11 11
>11 11  2  2  2 11 11 11  9  9  9  9  2  2  2  2  7  7  7  7 11 11 11  2  2
>2  3  3  3
>
>All the dob components are equally complete, including isdst.
>
>However, when I then try to compute difftime(sdate,dob), 69 values are
>missing:-
>Time differences in days
>   [1] 28899.00 28899.00 28899.00 28899.00 29369.96 29369.96 29369.96
>27120.04 27120.04 27120.04 27120.04 22690.00 22690.00 22690.00 22690.00
>28905.00 28905.00 28905.00 28905.00 31207.04 31207.04 31207.04 31207.04
>31209.04 31209.04
>  [26] 31209.04 31209.04 26323.00 26323.00 26323.00 26323.00 26338.00
>26338.00 26338.00 26338.00 27310.96 27310.96 27310.96 27310.96 23588.04
>23588.04 23588.04 23588.04 25255.00 25255.00 25255.00 25255.00 23752.00
>23752.00 23752.00
>  [51] 23752.00 29607.04 29607.04 29607.04 29607.04 27993.04 27993.04
>27993.04 27993.04 28384.04 28384.04 28384.04 28384.04 26176.00 26176.00
>26176.00 26176.00 28986.04 28986.04 28986.04 28986.04 28689.04 28689.04
>28689.04 28689.04
>  [76] 23722.00 23722.00 23722.00 23722.00 27353.00 27353.00 27353.00
>27353.00 26303.00 26303.00 28803.96 28803.96 28803.96 28803.96 28564.04
>28564.04 28564.04 28564.04 29826.96 29826.96 29826.96 29826.96 29826.96
>30410.00 30410.00
>[101] 30410.00 26490.04 26490.04 26490.04 26490.04       NA       NA
>NA       NA 29765.96 29765.96 29765.96 29765.96 29765.96 26325.00 26325.00
>26325.00 26325.00 26325.00 28824.00 28824.00 28824.00 28824.00 26808.00
>26808.00
>[126] 26808.00 26808.00 26808.00 28628.96 28628.96 28628.96 28628.96
>23807.00 23807.00 23807.00 23807.00 23807.00       NA       NA       NA
>NA       NA 25668.04 25668.04 25668.04 25668.04 28654.04 28654.04 28654.04
>28654.04
>[151] 21711.04 21711.04 21711.04 21711.04 27167.04 27167.04 27167.04
>27167.04 24296.04 24296.04 24296.04 24296.04 30540.04 30540.04 25330.00
>25330.00 25330.00 25330.00 25579.00 25579.00 25579.00 25579.00 29127.04
>29127.04 29127.04
>[176] 29127.04 29896.96 29896.96 29896.96 29896.96 25992.00 25992.00
>25992.00 25992.00 26625.00 26625.00 26625.00 26625.00 30121.04 30121.04
>30121.04 30121.04 21801.04 21801.04 21801.04 21801.04 31274.04 31274.04
>25907.00 25907.00
>[201] 25907.00 25907.00 28516.00 28516.00 28516.00 28516.00 28943.00
>28943.00 28943.00 28943.00 29847.96 29847.96 29847.96 29847.96 30529.04
>30529.04 30529.04 30529.04 30527.04 30527.04 30527.04 30527.04 29434.00
>29434.00 29434.00
>[226] 29434.00 29434.00 28631.04 28631.04 28631.04 28631.04 25761.04
>25761.04 25761.04 25761.04 25761.04 26127.04 26127.04 26127.04 26127.04
>26027.00 26027.00 26027.00 26027.00 28987.00 28987.00 28987.00 28987.00
>29232.00 29232.00
>[251] 29232.00 29232.00 26109.96 26109.96 26109.96 26109.96 31339.00
>31339.00 29235.00 29235.00 29235.00 29235.00 28092.00 28092.00 28092.00
>28092.00 30209.00 30209.00 30209.00 30209.00 30209.00 30281.00 30281.00
>30281.00 30281.00
>[276] 26880.96 26880.96 26880.96 26880.96 25691.04 25691.04 25691.04
>25691.04 22938.04 22938.04 22938.04 22938.04 25878.00 25878.00 25878.00
>25878.00 24470.00 24470.00 24470.00 24470.00 26046.96 26046.96 26046.96
>26046.96 26763.96
>[301] 26763.96 26763.96 26763.96 25720.96 25720.96 25720.96 25720.96
>29214.00 29214.00 29214.00 29214.00 26992.00 26992.00 26992.00 26992.00
>30659.00 30659.00 30659.00 30659.00 25600.00 25600.00 25600.00 25600.00
>26842.00 26842.00
>[326] 26842.00 26842.00 25541.00 25541.00 25541.00 25541.00 27386.00
>27386.00 27386.00 27386.00 30302.04 30302.04 30302.04 30302.04 28059.00
>28059.00 28059.00 28059.00       NA       NA       NA 25657.00 25657.00
>NA       NA
>[351]       NA 24835.00 24835.00 24835.00 24835.00 29340.96 29340.96
>29340.96 29340.96 26473.96 26473.96 26473.96 26473.96 28873.00 28873.00
>28873.00 28873.00 27690.00 27690.00 27690.00 27690.00 26554.00 26554.00
>26554.00 26554.00
>[376] 28876.00 28876.00 28876.00 28876.00 27156.96 27156.96 27156.96
>27156.96 26577.00 26577.00 26577.00 26577.00 27471.00 27471.00 27471.00
>27471.00 27323.00 27323.00 27323.00 27323.00 29232.00 29232.00 29232.00
>29232.00       NA
>[401]       NA       NA       NA 26523.96 26523.96 26523.96 26523.96
>26538.96 26538.96 26538.96 26538.96 24374.96 24374.96 24374.96 24374.96
>30798.00 30798.00 30798.00 30798.00       NA       NA       NA       NA
>22775.04 22775.04
>[426] 22775.04 22775.04 28464.00 28464.00 28464.00 28464.00 25763.04
>25763.04 25763.04 25763.04 30114.04 30114.04 30114.04 30114.04 26864.04
>26864.04 26864.04 26864.04       NA       NA       NA       NA 26945.04
>26945.04 26945.04
>[451] 26945.04 29528.96 29528.96 29528.96 29528.96 29058.04 29058.04
>29058.04 29058.04 29456.00 29456.00 29456.00 29456.00 26450.96 26450.96
>26450.96 26450.96 22837.96 22837.96 22837.96 22837.96 24222.96 24222.96
>24222.96 24222.96
>[476] 29592.00 29592.00 29592.00 29592.00 26573.00 26573.00 26573.00
>26573.00 26573.00 24811.00 24811.00 24811.00 24811.00 24834.00 24834.00
>24834.00 24834.00 31312.00 31312.00 31312.00 31312.00 23337.00 23337.00
>23337.00 23337.00
>[501] 26422.00 26422.00 26422.00 26422.00 22664.04 22664.04 22664.04
>22664.04 23192.04 23192.04 23192.04 23192.04 27557.04 27557.04 27557.04
>27557.04 23449.04 23449.04 23449.04 23449.04 27799.00 27799.00 27799.00
>27799.00 28747.04
>[526] 28747.04 28747.04 28747.04 24660.04 24660.04 24660.04 24660.04
>NA       NA       NA       NA 24683.04 24683.04 24683.04 24683.04 26576.00
>26576.00 26576.00 26576.00       NA       NA       NA       NA 28897.96
>28897.96
>[551] 28897.96 28897.96 25997.96 25997.96 25997.96 25997.96 24594.96
>24594.96 24594.96 24594.96 25965.00 25965.00 25965.00 30139.04 30139.04
>30139.04 26104.04 26104.04 26104.04 26104.04 26255.04 26255.04 26255.04
>26255.04 28887.00
>[576] 28887.00 28887.00 28887.00 28887.00       NA       NA       NA
>NA 25470.00 25470.00 25470.00 25470.00 20677.96 20677.96 20677.96 20677.96
>29227.00 29227.00 29227.00 29227.00       NA       NA       NA       NA
>29543.96
>[601] 29543.96 29543.96 31080.00 31080.00 31080.00 27710.00 27710.00
>27710.00 27710.00       NA       NA       NA       NA       NA 29903.00
>29903.00 29903.00 29903.00       NA       NA       NA       NA 24147.00
>24147.00 24147.00
>[626] 24147.00 23316.96 23316.96 23316.96 23316.96 27096.00 27096.00
>27096.00 27096.00 25543.00 25543.00 25543.00 25543.00 25543.00       NA
>NA       NA       NA 25131.04 25131.04 25131.04 29565.96 29565.96 29565.96
>29565.96
>[651] 28070.00 28070.00 28070.00 28070.00 28774.04 28774.04 28774.04
>28774.04 28073.00 28073.00 28130.00 28130.00 28130.00 28130.00 20038.00
>20038.00 20038.00 20038.00 27298.04 27298.04 27298.04 27298.04 27793.00
>27793.00 27793.00
>[676] 27793.00 25586.00 25586.00 25586.00 25586.00 26000.00 26000.00
>26000.00 26000.00 30577.04 30577.04 30577.04 30577.04 27194.04 27194.04
>27194.04 27194.04 23156.04 23156.04 23156.04 23156.04 23978.04 23978.04
>23978.04 23978.04
>[701]       NA       NA       NA       NA 24391.00 24391.00 24391.00
>24391.00 27152.96 27152.96 27152.96 27152.96 28852.00 28852.00 28852.00
>25419.00 25419.00 29212.00 29212.00 29212.00 29212.00 23660.00 23660.00
>23660.00 23660.00
>[726] 26022.96 26022.96 26022.96 26022.96 25566.00 25566.00 25566.00
>25566.00 25336.96 25336.96 25336.96 25336.96 26931.96 26931.96 26931.96
>26931.96 26758.00 26758.00 26758.00 26537.96 26537.96 26537.96 26537.96
>27026.00 27026.00
>[751] 27026.00       NA       NA       NA 24349.96 24349.96 24349.96
>24349.96 25960.00 25960.00 25960.00 25960.00 27276.00 27276.00 27276.00
>26826.96 26826.96 26826.96 26826.96 26428.96 26428.96 26428.96 26428.96
>26780.96 26780.96
>[776] 26780.96 26780.96 26301.00 26301.00 26301.00 26301.00 28385.96
>28385.96 28385.96 28385.96 27210.96 27210.96 27210.96 23704.00 23704.00
>23704.00 24160.04 24160.04 24160.04 24160.04 25703.96 25703.96 25703.96
>25703.96 25269.00
>[801] 25269.00 25269.00 25269.00 29886.96 29886.96 29886.96       NA
>NA       NA       NA       NA       NA
>attr(,"tzone")
>[1] ""
>
>Here are the values of sdate and dob that relate to the missing values in
>difftime(sdate,dob)
>
>>  sdate[is.na(difftime(sdate,dob))]
>  [1] "2002-02-28" "2002-02-28" "2002-02-28" "2002-02-28" "2002-07-30"
>"2002-07-30" "2002-07-30" "2002-07-30" "2002-07-30" "2003-06-17"
>"2003-06-17" "2003-06-17" "2003-10-30" "2003-10-30" "2003-10-30"
>"2002-07-22" "2002-07-22"
>[18] "2002-07-22" "2002-07-22" "2002-12-18" "2002-12-18" "2002-12-18"
>"2002-12-18" "2003-03-10" "2003-03-10" "2003-03-10" "2003-03-10"
>"2003-02-05" "2003-02-05" "2003-02-05" "2003-02-05" "2003-03-19"
>"2003-03-19" "2003-03-19"
>[35] "2003-03-19" "2003-05-29" "2003-05-29" "2003-05-29" "2003-05-29"
>"2003-08-13" "2003-08-13" "2003-08-13" "2003-08-13" "2003-11-03"
>"2003-11-03" "2003-11-03" "2003-11-03" "2003-11-03" "2002-06-25"
>"2002-06-25" "2002-06-25"
>[52] "2002-06-25" "2003-04-10" "2003-04-10" "2003-04-10" "2003-04-10"
>"2003-04-03" "2003-04-03" "2003-04-03" "2003-04-03" "2003-10-15"
>"2003-10-15" "2003-10-15" "2003-11-21" "2003-11-21" "2003-11-21"
>"2003-12-04" "2003-12-04"
>[69] "2003-12-04"
>>  dob[is.na(difftime(sdate,dob))]
>  [1] "1927-04-03" "1927-04-03" "1927-04-03" "1927-04-03" "1925-04-11"
>"1925-04-11" "1925-04-11" "1925-04-11" "1925-04-11" "1939-04-03"
>"1939-04-03" "1939-04-03" "1940-12-30" "1940-12-30" "1940-12-30"
>"1917-10-14" "1917-10-14"
>[18] "1917-10-14" "1917-10-14" "1925-04-16" "1925-04-16" "1925-04-16"
>"1925-04-16" "1927-04-05" "1927-04-05" "1927-04-05" "1927-04-05"
>"1939-04-08" "1939-04-08" "1939-04-08" "1939-04-08" "1938-10-24"
>"1938-10-24" "1938-10-24"
>[35] "1938-10-24" "1930-10-16" "1930-10-16" "1930-10-16" "1930-10-16"
>"1923-04-17" "1923-04-17" "1923-04-17" "1923-04-17" "1929-04-17"
>"1929-04-17" "1929-04-17" "1929-04-17" "1929-04-17" "1925-04-11"
>"1925-04-11" "1925-04-11"
>[52] "1925-04-11" "1931-04-02" "1931-04-02" "1931-04-02" "1931-04-02"
>"1929-04-18" "1929-04-18" "1929-04-18" "1929-04-18" "1917-10-22"
>"1917-10-22" "1917-10-22" "1928-03-28" "1928-03-28" "1928-03-28"
>"1928-04-09" "1928-04-09"
>[69] "1928-04-09"
>
>The values of dob here do not differ in any obvious way from those in the
>rest of the dob vector, where difftime(sdate,sob) gives sensible results.
>
>If I try to recompute the difftime, the result is the same.
>
>s1=sdate[is.na(difftime(sdate,dob))]
>d1=dob[is.na(difftime(sdate,dob))]
>difftime(s1,d1)
>Time differences in secs
>  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>attr(,"tzone")
>[1] ""
>
>However, if I now create the first value of each missing vector manually,
>then difftime works:-
>
>js1=strptime('2002-02-28','%Y-%m-%d'); js1
>#[1] "2002-02-28"
>jb1=strptime('1927-04-03','%Y-%m-%d'); jb1
>#[1] "1927-04-03"
>difftime(js1,jb1)
>#Time difference of 27360 days
>
>So, it appears that strptime is handling these values differently in the
>vector, but manages them correctly one by one.
>
>I'm sorry if I'm being silly, but I can't see the problem. I'd be VERY
>grateful if someone could help me to find it and fix it.
>
>With many thanks in advance for your thoughts,
>
>Jonathan Williams
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062



More information about the R-help mailing list