[R] Advice on data format

Michael Dewey lists at dewey.myzen.co.uk
Wed Aug 5 14:50:17 CEST 2015


Dear Trent

If you want them side-by-side in one data frame then you could use merge 
making sure it only merges by date. I would use sub to change all the 
anitbiotic names by adding "h1" "h2" and so on. Then you can sum 
antibiotic over hospital by using grep to select all the columns 
containing antibiotic1. The side-by-side solution has some advantages 
over stacking them vertically and some disadvantages. You may need to do 
both for different purposes.

You would need to learn about regular expressions if they are not 
already familiar to you to get the best out of sub and grep.

On 05/08/2015 07:30, Trent Yarwood wrote:
> Hi all,
>
> I'm responsible for collating data on antibiotic use at my local group of
> hospitals.  I have data for five different hospitals, about 40 different
> antibiotics and monthly data going back to 2006.
>
> At the moment, I have this stored in 5 datafiles, one for each hospital,
> formatted as follows:
>
> date, antibiotic1, antibiotic2, antibiotic3....
> 1-mmm-yy, ab11, ab21, ab31....
> 1-mmm-yy, ab12, ab22, ab32...
>
> This works most of the time for me, because the most common thing I need to
> do is to track a particular hospital's antibiotic use over time (sum of
> columns, as a time series by row).
>
> What I would like to do is to amalgamate the data so instead of analysing
> an individual hospital (ie a datasheet in the current format) is to be able
> to look at a particular antibiotic across the five hospitals.
>
> The best way I can visualise this is having the data in a data cube, with
> each hospital as a single plane. Currently, my hospitals are (x,y,1),
> (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the sum
> of antibiotic1 in all hospitals.
>
> I imagine one way of doing this is having a hospital column in the data:
>
> date, hospital, antibiotic1, antibiotic2, antibiotic3...
> 1-mmm-yy, hospital1, a11, a21, a31...
> 1-mmm-yy, hospital2, a11, a21, a31... etc
>
> Two questions:
>
> 1) Is there a better way of storing the data than this?
> 2) Is there an easy way to turn what I have into what I want?
>
> I know that once I have the data sorted, I'll be able to dpyl it into the
> categories I currently use - it's the getting from here to there I need
> help with, please.
>
> Cheers,
>
> Trent.
>
>
>
>
>
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html



More information about the R-help mailing list