[R] OK - I got the data - now what? :-)

Mark Wardle mark at wardle.org
Mon Jul 6 12:13:16 CEST 2009


Hi. As I said in my first email, converting your data into a "long"
format makes a lot of sense. I'm sorry that you find it "hard ... to
understand why this would make plotting easier".

Wide format:
Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4.....

is much better rotated to be
Subject ID, Experiment ID, humidity, light, whatever, time, result

So you end up with multiple rows per patient/individual/experiment. It
is much easier to analyse and plot data like this, particularly if the
original data is ragged. ie. you have a different number of
measurements per patient/individual/experiment.

Many plotting functions will support connecting related data (e.g. by
virtue of a particular identifier) and support much of what you are
likely to want (different plotting symbols, panelled plots depending
on experimental conditions etc) without you having to manually work
through data as you are suggesting.

Best wishes,

Mark


2009/7/6 Mark Knecht <markknecht at gmail.com>:
> On Sun, Jul 5, 2009 at 1:44 PM, hadley wickham<h.wickham at gmail.com> wrote:
>>>   I think the root cause of a number of my coding problems in R right
>>> now is my lack of skills in reading and grabbing portions of the data
>>> out of arrays. I'm new at this. (And not a programmer) I need to find
>>> some good examples to read and test on that subject. If I could locate
>>> which column was called C1, then read row 3 from C1 up to the last
>>> value before a 0, I'd have proper data to plot for one line. Repeat as
>>> necessary through the array and I get all the lines. Doing the lines
>>> one at a time should allow me the opportunity to apply color or not
>>> plot based on values in the first few columns.
>>>
>>> Thanks,
>>> Mark
>>>
>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>> test<-round(test,2)
>>>
>>> #Make array ragged
>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>> test$C6[7]<-0
>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>
>>> #Print array
>>> test
>>
>> Are the zeros always going to be arranged like this? i.e. for
>> experiment there is a point at which all later values are zero?  If
>> so, the following is a much simpler way of getting to the core of your
>> data, without fussing with overly complicated matrix indexing:
>>
>> library(reshape)
>> testm <- melt(test, id = c("A", "B"))
>> subset(testm, value > 0)
>>
>> I suspect you will also find this form easier to plot and analyse.
>>
>> Hadley
>>
>> --
>> http://had.co.nz/
>>
>
> Hi Hadley,
>   I wanted to look at reshape.
>
>   Yes, there exists a point in each row (unless I get to the end with
> all numbers) where I get to a zero and everything to the right is
> zero.
>
>   I'm looking at ReShape. It's interesting but I clearly don't
> understand it yet so I'm reading your ReShaping data with the reshap
> package form 11/07. Interesting.
>
>   I know so little about R that I'm sort of drowning at this point
> that it's hard for me to understand why this would make plotting
> easier. Analysis possibly. Just the way it goes when you get started
> with something new.
>
>   In ReShape lingo I think I have ID's. They cover things like time,
> date, success/failure and a few other things of interest. Once the
> data starts on a row it is all data from there on to the end of the
> row.
>
>   My initial goal is to make a line plot of the data on a single row.
> All the data points should connect together. There is no real
> interaction planned with data on other rows, at least at this time.
>
>   Thanks for the pointers and the code stub. I'll be looking at this.
>
> Cheers,
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK




More information about the R-help mailing list