[R] Question on creating Date variable

David Winsemius dwinsemius at comcast.net
Tue Jan 1 17:22:43 CET 2013


On Dec 31, 2012, at 9:40 PM, Christofer Bogaso wrote:

> On 01 January 2013 03:00:18, David Winsemius wrote:
>>
>> On Dec 31, 2012, at 11:57 AM, David Winsemius wrote:
>>
>>>
>>> On Dec 31, 2012, at 11:54 AM, Christofer Bogaso wrote:
>>>
>>>> On 01 January 2013 01:29:53, David Winsemius wrote:
>>>>>
>>>>> On Dec 31, 2012, at 11:35 AM, Christofer Bogaso wrote:
>>>>>
>>>>>> On 01 January 2013 00:17:50, David Winsemius wrote:
>>>>>>>
>>>>>>> On Dec 31, 2012, at 9:12 AM, Christofer Bogaso wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> Let say I have following (numeric) vector:
>>>>>>>>
>>>>>>>> > x
>>>>>>>> [1] 11.00 11.25 11.35 12.01 11.14 13.00 13.25 13.35 14.01 13.14
>>>>>>>> 14.50
>>>>>>>> 14.75 14.85 15.51 14.64
>>>>>>>>
>>>>>>>> Now, I want to create a 'Date' variable (i.e. I should be able
>>>>>>>> to do
>>>>>>>> all calculations pertaining to date/time and also time-series
>>>>>>>> plotting etc.) like
>>>>>>>>
>>>>>>>> 2012-12-31 11:00:00 AM, 2012-12-31 11:25:00 AM, 2012-12-31  
>>>>>>>> 11:35:00
>>>>>>>> AM, 2012-12-31 12:01:00 PM, . . . .
>>>>>>>>
>>>>>>>
>>>>>>> Those _times_ ( _not_ Dates) cannot possibly be in %M.%S"  
>>>>>>> format,
>>>>>>> given the number of items to the right of the decimal point  
>>>>>>> that are
>>>>>>> greater than 60. So will proceed on the arguably more likely
>>>>>>> assumption that they are in fractional minutes. To recover  
>>>>>>> from that
>>>>>>> problem, one might consider:
>>>>>>>
>>>>>>> > as.POSIXct(paste( floor(x), round(60*(x-floor(x))) ),
>>>>>>> format="%M %S")
>>>>>>> [1] "2012-12-31 00:11:00 PST" "2012-12-31 00:11:15 PST"
>>>>>>> [3] "2012-12-31 00:11:21 PST" "2012-12-31 00:12:01 PST"
>>>>>>> [5] "2012-12-31 00:11:08 PST" "2012-12-31 00:13:00 PST"
>>>>>>> [7] "2012-12-31 00:13:15 PST" "2012-12-31 00:13:21 PST"
>>>>>>> [9] "2012-12-31 00:14:01 PST" "2012-12-31 00:13:08 PST"
>>>>>>> [11] "2012-12-31 00:14:30 PST" "2012-12-31 00:14:45 PST"
>>>>>>> [13] "2012-12-31 00:14:51 PST" "2012-12-31 00:15:31 PST"
>>>>>>> [15] "2012-12-31 00:14:38 PST"
>>>>>>>
>>>>>>
>>>>>> I understand that some of those elements are not "dates". However
>>>>>> what I want is the ***"PM/AM" suffix*** on those elements which  
>>>>>> are
>>>>>> considered as Dates.
>>>>>>
>>>>>> ***Getting those suffix*** and doing calculations on those  
>>>>>> changed
>>>>>> variables is my primary concern.
>>>>>
>>>>> That's the first time that AM/PM has bee mentioned and I suppose  
>>>>> if
>>>>> those were fractional hours rather than my guess of fractional  
>>>>> minutes
>>>>> that there might be representatives of both in the numeric data  
>>>>> you
>>>>> offered. Why don't you clarify what these number do in fact  
>>>>> represent?
>>>>> And what problem you are trying to solve?
>>>>>
>>>>
>>>> Basically those are artificial data! Actually I do not have the
>>>> right to give out the original data in any public forum. So I
>>>> created those artificial data so that I can get the fundamental  
>>>> idea
>>>> ...........
>>>>
>>>> Each element (assuming they are legitimate time) represents the  
>>>> time
>>>> for a particular day when some event is pop-up. like, 11AM,  
>>>> 11.30AM,
>>>> 12.05PM etc.. I could work with something like 11.00, 11.30, 12.05,
>>>> 15.00 etc. however I believe adding AM/PM suffice will make my
>>>> report more eye-catching.
>>>>
>>>> Please let me know if you need more clarification.
>>>
>>> So what's with the values above 59 in the minutes?
>>
>> Failing an answer to that question, this code shows how to input
>> date-time vectors from character vectors and then output it from
>> date-time class to character class:
>>
>> x <- scan(text="11.00 11.25 11.35 12.01 11.14 13.00 13.25 13.35 14.01
>> 13.14 14.50 14.75 14.85 15.51 14.64")  # This will come in as a
>> numeric vector
>>
>> ?strptime     # for the available format specifications
>> format( as.POSIXct(as.character(x), format="%H.%M"),  # That is the
>> input format
>>             format="%I.%M %p")     # the output format
>> [1] NA         "11.25 AM" "11.35 AM" "12.01 PM" "11.14 AM" NA
>> [7] "01.25 PM" "01.35 PM" "02.01 PM" "01.14 PM" "02.05 PM" NA
>> [13] NA         "03.51 PM" NA
>>
>> I suspect that the NA when minutes are ".00" comes from the implicit
>> loss of the trailing digits:
>>
>> > as.character(0.00)
>> [1] "0"
>>
>> The claim that this data is proprietary and cannot presented in its
>> original form sound somewhat ridiculous.  Simmply post:
>>
>> dput(head(dfrm$time_data_column_name, 20))
>>
>> How could that represent any disclosure of proprietary information if
>> presented with no context?
>>
>
> 'How could that represent any disclosure of proprietary information  
> if presented with no context? ' I must agree with you. But I just  
> dont want to take any risk! (job scenario in my country is not very  
> optimistic and I want to give my boss minimal chance/reason to fire!)
>
> And secondly with your approach, I cant do any calculation. Let take  
> following example:
>
> y <- format( as.POSIXct(as.character(x), format="%H.%M"),  # That is  
> the input format
>            format="%I.%M %p")

That was my code. If you will need to apply numeric operators then you  
should be storing the POSIXct value as an intermediate:

dt_time <- as.POSIXct(as.character(x), format="%H.%M")

out_time <- format( dt_time,
            format="%I.%M %p")
dt_time[3] - dt_time[2]

-- 
David.


>
> y[3] - y[2]
>
> This gives me following error:
>
> Error in y[3] - y[2] : non-numeric argument to binary operator
>
> I am having same error with Devid's approach as well:
>
>> y <- as.POSIXct(paste( floor(x), round(60*(x-floor(x))) ),  
>> format="%H %M")
>> z <- format(y, format="%Y-%m-%d %I:%M %p")
>> z[2] - z[1]
> Error in z[2] - z[1] : non-numeric argument to binary operator.
>
> Thanks and regards,
>
>
>

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list