[R] OK - I got the data - now what? :-)

Mark Knecht markknecht at gmail.com
Mon Jul 6 18:13:18 CEST 2009


Hi Mark,
   Don't be the least bit sorry that I'm finding any of this hard to
understand. That's my problem. I ordered the Phil Spektor's "Data
Manipulation with R (Use R)" book last night as I realize I need to go
through some sort of training. Hopefully that will help clear up some
of my questions about the language in general without burdening this
list so much.

   This morning, taking your input to heart, I started working more
with Hadley's code example. ReShape is pretty slick. I added a

MyExperiments <- cast(MyResults, A ~ variable)

and got a new data.frame that looks like it's more or less ready to
print. Note that I'm not attached to data.frames. It's just that I get
one with read.csv and then don't know when to change it to something
else.

 I then tried cast to put the molten data back into a data.frame.
(Maybe this is the point to switch to a list or some other type?) That
done then

MyExperiments[1,]

gives me back the data for experiment #1 with the experiment number in
column 1. If I can figure out how to get rid of that then I think I
can get the experiment plotted. Put that in a loop and I should get
1000 experiments plotted which is my goal.

   This is all very cool as it turns out to be very few lines of code
to dig through the array. I'll have a couple of other problems (for
me) in working with the real array as the name space is much bigger
and I need to learn how to build things like ("C1","C2","C3",
...,"C1200") automatically, but I'm sure there's a way to do that.

   Note that the code below can 'fail' in the sense of having NA's in
the middle because the runif doesn't guarantee 0's to the right. My
real data won't have that problem

Thanks,
Mark



library(reshape)

test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test<-round(test,2)

#Make array ragged
test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
test$C6[7]<-0
test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0

#Print array
test

#Display column names
names(test)

ReShapeX <- melt(test, id = c("A", "B"))

MyResults<-subset(ReShapeX, value > 0)

names(MyResults)

MyResults

MyExperiments <- cast(MyResults,A ~ variable)

class(MyExperiments)

MyExperiments[1,]
MyExperiments[2,]
MyExperiments[3,]


On Mon, Jul 6, 2009 at 3:13 AM, Mark Wardle<mark at wardle.org> wrote:
> Hi. As I said in my first email, converting your data into a "long"
> format makes a lot of sense. I'm sorry that you find it "hard ... to
> understand why this would make plotting easier".
>
> Wide format:
> Subject ID, Experiment ID, humidity, light, whatever, T1, T2,T3,T4.....
>
> is much better rotated to be
> Subject ID, Experiment ID, humidity, light, whatever, time, result
>
> So you end up with multiple rows per patient/individual/experiment. It
> is much easier to analyse and plot data like this, particularly if the
> original data is ragged. ie. you have a different number of
> measurements per patient/individual/experiment.
>
> Many plotting functions will support connecting related data (e.g. by
> virtue of a particular identifier) and support much of what you are
> likely to want (different plotting symbols, panelled plots depending
> on experimental conditions etc) without you having to manually work
> through data as you are suggesting.
>
> Best wishes,
>
> Mark
>
>
> 2009/7/6 Mark Knecht <markknecht at gmail.com>:
>> On Sun, Jul 5, 2009 at 1:44 PM, hadley wickham<h.wickham at gmail.com> wrote:
>>>>   I think the root cause of a number of my coding problems in R right
>>>> now is my lack of skills in reading and grabbing portions of the data
>>>> out of arrays. I'm new at this. (And not a programmer) I need to find
>>>> some good examples to read and test on that subject. If I could locate
>>>> which column was called C1, then read row 3 from C1 up to the last
>>>> value before a 0, I'd have proper data to plot for one line. Repeat as
>>>> necessary through the array and I get all the lines. Doing the lines
>>>> one at a time should allow me the opportunity to apply color or not
>>>> plot based on values in the first few columns.
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>>> test<-round(test,2)
>>>>
>>>> #Make array ragged
>>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>>> test$C6[7]<-0
>>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>>
>>>> #Print array
>>>> test
>>>
>>> Are the zeros always going to be arranged like this? i.e. for
>>> experiment there is a point at which all later values are zero?  If
>>> so, the following is a much simpler way of getting to the core of your
>>> data, without fussing with overly complicated matrix indexing:
>>>
>>> library(reshape)
>>> testm <- melt(test, id = c("A", "B"))
>>> subset(testm, value > 0)
>>>
>>> I suspect you will also find this form easier to plot and analyse.
>>>
>>> Hadley
>>>
>>> --
>>> http://had.co.nz/
>>>
>>
>> Hi Hadley,
>>   I wanted to look at reshape.
>>
>>   Yes, there exists a point in each row (unless I get to the end with
>> all numbers) where I get to a zero and everything to the right is
>> zero.
>>
>>   I'm looking at ReShape. It's interesting but I clearly don't
>> understand it yet so I'm reading your ReShaping data with the reshap
>> package form 11/07. Interesting.
>>
>>   I know so little about R that I'm sort of drowning at this point
>> that it's hard for me to understand why this would make plotting
>> easier. Analysis possibly. Just the way it goes when you get started
>> with something new.
>>
>>   In ReShape lingo I think I have ID's. They cover things like time,
>> date, success/failure and a few other things of interest. Once the
>> data starts on a row it is all data from there on to the end of the
>> row.
>>
>>   My initial goal is to make a line plot of the data on a single row.
>> All the data points should connect together. There is no real
>> interaction planned with data on other rows, at least at this time.
>>
>>   Thanks for the pointers and the code stub. I'll be looking at this.
>>
>> Cheers,
>> Mark
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Dr. Mark Wardle
> Specialist registrar, Neurology
> Cardiff, UK
>




More information about the R-help mailing list