[R] converting a data set to a format for time series analysis
Ricardo Pietrobon
pietr007 at gmail.com
Mon Jun 9 19:51:15 CEST 2008
Jim, thanks a lot. This does the trick for dates, but what I have
been struggling the most with is actually the conversion from having
one subject per row to having one month per row. I didn't explain
that well at all in my previous email and so let me try again. The
idea is that the current data set is displayed with one subject per
row. I would like to have it displayed having one hospital per month
per row. For example, the new data set would look like this:
month year site number_enrolled_subjects hospital_beds
1 2002 hospitalA 22
300
meaning that hospital A enrolled 22 subjects in 01/2002, and hospital
A has 300 beds -- the beds variable is one variable in a vector that
would display all the covariates for my ARIMA model
your suggestion solved the problem for the dates, but the command I am
looking for now is something that would count the number of subjects
per site per month of a year and then displayed it in the format
above. any thoughts?
I really appreciate your help
On Mon, Jun 9, 2008 at 1:04 PM, jim holtman <jholtman at gmail.com> wrote:
> Will something like this work for you:
>
>> x <- read.table(textConnection("subject hospital date_enrollment
>> hospital_beds
> + 1 hospitalA 1/3/2002 300
> + 2 hospitalA 1/6/2002 300
> + 3 hospitalB 2/4/2002 150
> + 4 hospitalC 3/2/2002 200"), header=TRUE)
>> closeAllConnections()
>> y <- as.Date(x$date_enrollment, "%m/%d/%Y")
>> cbind(x, year=format(y, "%Y"), month=format(y, "%m"))
> subject hospital date_enrollment hospital_beds year month
> 1 1 hospitalA 1/3/2002 300 2002 01
> 2 2 hospitalA 1/6/2002 300 2002 01
> 3 3 hospitalB 2/4/2002 150 2002 02
> 4 4 hospitalC 3/2/2002 200 2002 03
>>
>>
>
>
> On Mon, Jun 9, 2008 at 12:45 PM, Ricardo Pietrobon <pietr007 at gmail.com>
> wrote:
>>
>> I currently have a data set describing human subjects enrolled into an
>> international clinical trial, the name of the hospital enrolling this
>> human subject, the date when the subject was enrolled, and a vector
>> with variables representing characteristics of the site (e.g., number
>> of beds in a hospital). my data sets looks like this:
>>
>> subject hospital date_enrollment hospital_beds
>> 1 hospitalA 1/3/2002 300
>> 2 hospitalA 1/6/2002 300
>> 3 hospitalB 2/4/2002 150
>> 4 hospitalC 3/2/2002 200
>>
>> to perform a time series analysis I am now trying to get to a format
>> that would give me the following variables:
>>
>> month year site number_enrolled_subjects hospital_beds
>>
>> the data would be displayed on one-month intervals, and number of
>> subjects clustered around sites.
>>
>> any help would be greatly appreciate
>>
>> thanks
>>
>>
>> Ricardo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
More information about the R-help
mailing list