[R] graphs, need urgent help (deadline :( )

Rosa Oliveira
Wed Jun 10 23:18:49 CEST 2015

Dear Don, thank you very much.

I really wasn’t being able to figure the problem.

You were a big (huge) help.

Seeing the graphs, I think I’ll try to put the 3 settings (sample size)  in different graphs.

I’ll try to use trellis graphs :) using sample size as the “factor”

Thank you very much ;)


Rosa Oliveira


> On 10 Jun 2015, at 20:07, Don McKenzie <dmck at u.washington.edu> wrote:
> Here is code that IS tested.  I am sending Rosa the (ugly) output in a separate file.  Crazy problems with argument order; I never figured out
> exactly what was wrong.
> # therapy plot
>  plot(therapy.df$Region[therapy.df$sample==50],therapy.df$factor.a[therapy.df$sample==50],xlab="Region",ylab="factor",type="l",col=4,ylim=c(0,1.5))
> lines(therapy.df$Region[therapy.df$sample==50],therapy.df$factor.b[therapy.df$sample==50],col=2)
> lines(therapy.df$Region[therapy.df$sample==50],therapy.df$factor.c[therapy.df$sample==50],col=3)
> lines(therapy.df$Region[therapy.df$sample==250],therapy.df$factor.a[therapy.df$sample==250],col=4,lty=2)
> lines(therapy.df$Region[therapy.df$sample==250],therapy.df$factor.b[therapy.df$sample==250],col=2,lty=2)
> lines(therapy.df$Region[therapy.df$sample==250],therapy.df$factor.c[therapy.df$sample==250],col=3,lty=2)
> lines(therapy.df$Region[therapy.df$sample==1000],therapy.df$factor.a[therapy.df$sample==1000],col=4,lty=3)
> lines(therapy.df$Region[therapy.df$sample==1000],therapy.df$factor.b[therapy.df$sample==1000],col=2,lty=3)
> lines(therapy.df$Region[therapy.df$sample==1000],therapy.df$factor.c[therapy.df$sample==1000],col=3,lty=3)
> legend(7,1.4,c("factor.a","factor.b","factor.c"),col=c(4,2,3),lty=1)
>> On Jun 10, 2015, at 11:03 AM, Rosa Oliveira <rosita21 at gmail.com <mailto:rosita21 at gmail.com>> wrote:
>> Sorry,
>> I taught I attached the cvs file :)
>> <therapy.csv>
>> Don,
>> I tried, but I got an error:
>> > my.data$Region
>>  [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10
>> > my.data$sample
>>  [1]   50   50   50   50   50   50   50   50   50   50  250  250  250  250  250  250  250  250  250  250 1000 1000 1000 1000 1000 1000 1000 1000
>> [29] 1000 1000
>> > my.data$factor.a
>>  [1] 0.895 0.811 0.685 0.777 0.600 0.466 0.446 0.392 0.256 0.198 0.136 0.121 0.875 0.777 0.685 0.626 0.550 0.466 0.384 0.330 0.060 0.138 0.065
>> [24] 0.034 0.931 0.124 0.060 0.028 0.017 0.014
>> > plot(my.data$Region[my.data$sample==50],my.data$factor.a[my.data$sample==50],col=4,type=“l”,xlab=“Region”,ylab=“factor")
>> Error: unexpected input in "plot(my.data$Region[my.data$sample==50],my.data$factor.a[my.data$sample==50],col=4,type=�”
>> I’m really naive, right?
>> Best,
RO
-- 
____________________________________________________________________________
____________________________________________________________________________
>>> On 10 Jun 2015, at 18:10, Don McKenzie <dmck at u.washington.edu> wrote:
>>> For a legend, try (untested)
>>> legend(0.15,0.9,c("factora","factorb","factorc"),col=c(4,2,3),lty=1)
>>> If it overlaps data points move the first two arguments (0.15 and 0.9) around, or change the “ylim” argument in the plot() to ~1.2.
>>> to avoid clutter, put the line-types information in the figure caption (IMO)
>>>> On Jun 10, 2015, at 10:03 AM, Don McKenzie <dmck at u.washington.edu> wrote:
>>>>> On Jun 10, 2015, at 9:08 AM, Rosa Oliveira <rosita21 at gmail.com> wrote:
>>>>> Dear All,
>>>>> I attach my data.
>>>>> Dear Jim, 
>>>>> when I run your code (even the one you send me, not in my data), I get: 
>>>>> Don't know how to automatically pick scale for object of type function. Defaulting to continuous
>>>>> Error in data.frame(x = c(0.1, 0.2, 0.1, 0.2, 0.1, 0.2, 0.1, 0.2, 0.1,  : 
>>>>>   arguments imply differing number of rows: 24, 0
>>>>> Dear Don,
>>>>> It’s meant that I will have 12 lines: 
>>>>> 3 factors - lines colors
>>>>> with 3 different values of “sample” for each - line types
>>>>> [Three colors, one for each factor,
>>>>> and  three line types (lty=1,2,3), one for eachvalue of “sample - preferable dash, thin and thick).
>>>>> in the X - I should have region (because I have 10 regions)
>>>>> for each region I have the outcome of 3 different treatments (factor)
>>>>> for each region and each treatment I have 3 different sample size.
>>>> But in your original post you had 4 sample sizes: 10,20,30,40.
>>>>> I need to “see” the the influence of the region in the treatment outcome for each sample size.
>>>>> So, at the end I should have 9 lines
>>>>> 3 red (1 dash, 1 thin, 1 thick) - concerning factor a (dash for sample size 50, thin for sample size 250 and thick for sample size 1000)
>>>>> 3 blue (1 dash, 1 thin, 1 thick) - concerning factor b (dash for sample size 50, thin for sample size 250 and thick for sample size 1000)
>>>>> 3 green (1 dash, 1 thin, 1 thick) - concerning factor c (dash for sample size 50, thin for sample size 250 and thick for sample size 1000)
>>>>> Hope this time is clear.
>>>>> I also though about doing 3 different graphs, each one for 1 different sample size, and in that case I should have 3 graphs each one with 3 lines
>>>>> 1 red to factor a, 1 blue to factor b and 1 green to factor c.
>>>>> Do you all think is better?
>>>> A matter of style perhaps but I would use dotplots because you have only two data points for each “line”.  The lines will be misleading.  You also could use 
>>>> panel plots, but given your skill set (unless someone wants to spend a fair bit of time with you), it’s probably best to stay as simple as possible.
>>>> But given your original post (cleaned up)   # untested: apologies for any typos
>>>>>        region              sample          factora          factorb 		factorc
>>>>> 	0.1  			10     	 0.895   		0.903   		0.378
>>>>> 	0.2  			10      	0.811  		 0.865  		 0.688
>>>>> 	0.1  			20      	0.735   		0.966   		0.611
>>>>> 	0.2  			20     	 0.777  		 0.732  		 0.653
>>>>> 	0.1  			30      	0.600   		0.778   		0.694
>>>>> 	0.2  			30     	 0.466  		 174.592 		0.461
>>>>> 	0.1  			40     	 0.446   		0.432   		0.693
>>>>> 	0.2  			40     	 0.392   		0.294  		 0.686
>>>> plot(my.data$region[my.data$sample==10],my.data$factora[my.data$sample==10],col=4,type=“l”,ylim=c(0,1),xlab=“region”,ylab=“factor")
>>>> lines(my.data$region[my.data$sample==10],my.data$factorb[my.data$sample==10],col=2)
>>>> lines(my.data$region[my.data$sample==10],my.data$factorc[my.data$sample==10],col=3)
>>>> lines(my.data$region[my.data$sample==20],my.data$factora[my.data$sample==20],col=4,lty=2)
>>>> lines(my.data$region[my.data$sample==20],my.data$factorb[my.data$sample==20],col=2,lty=2)
>>>> lines(my.data$region[my.data$sample==20],my.data$factorc[my.data$sample==20],col=3,lty=2)
>>>> #  Now do two more groups of 3, changing the parameter “lty” to 3 and then 4
>>>> # Look at the syntax and note what changes and what stays constant. Do you see how this works?
>>>> # there will be what looks like a vertical line where sample = 30 and factorb = 174.592.  Do you see why?
>>>> # then you will need a legend
>>>>> Nonetheless I can’t do it :(
>>>>> best,
RO
-- 
____________________________________________________________________________
____________________________________________________________________________
>>>>>> On 10 Jun 2015, at 14:13, John Kane <jrkrideau at inbox.com> wrote:
>>>>>> Hi Jim,
>>>>>> I was looking at that last night and had the same problem of visualizing what Rosa needed.  
>>>>>> Hi Rosa
>>>>>> This is nothing like what you wanted and I really don't understand your data but would something like this work as a substitute or am I completely lost?
>>>>>> dat1  <-  structure(list(region = c(0.1, 0.2, 0.1, 0.2, 0.1, 0.2, 0.1, 
>>>>>> 0.2), sample = c(10L, 10L, 20L, 20L, 30L, 30L, 40L, 40L), factora = c(0.895, 
>>>>>> 0.811, 0.735, 0.777, 0.6, 0.466, 0.446, 0.392), factorb = c(0.903,
>>>>>> 0.865, 0.966, 0.732, 0.778, 0.592, 0.432, 0.294), factorc = c(0.37, 
>>>>>> 0.688, 0.611, 0.653, 0.694, 0.461, 0.693, 0.686)), .Names = c("region", 
>>>>>> "sample", "factora", "factorb", "factorc"), class = "data.frame", row.names = c(NA, 
>>>>>> -8L))
>>>>>> mdat1  <-   melt(dat1, id.var = c("region", "sample"),
>>>>>>                    variable.name = "factor",
>>>>>>                    value.name = "value")
>>>>>> str(mdat1)
>>>>>> ggplot(mdat1, aes(region, value, colour = factor)) +
>>>>>>                geom_line() + facet_grid(sample ~ .)
>>>>>>> Hi Rosa,
>>>>>>> Like Don, I can't work out what you want and I don't even have the
>>>>>>> picture. For example, your specification of color and line type leaves
>>>>>>> only one point for each color and line type, and the line from one
>>>>>>> point to the same point is not going to show up. Here is a possibility
>>>>>>> that may lead (eventually) to a solution.
>>>>>>> library(plotrix)
>>>>>>> par(tcl=-0.1)
>>>>>>> gap.plot(x=rep(seq(10,45,by=5),3),
>>>>>>> y=unlist(my.data[,c("factora","factorb","factorc")]),
>>>>>>> main="A plot of factorial mystery",
>>>>>>> gap=c(1.1,174),ylim=c(0,175),ylab="factor score",xlab="Group",
>>>>>>> xticlab=c(" \n0.1\n10"," \n0.2\n10"," \n0.1\n20"," \n0.2\n20",
>>>>>>>  " \n0.1\n30"," \n0.2\n30"," \n0.1\n40"," \n0.2\n40"),
>>>>>>> ytics=c(0,0.5,1,174.59),pch=rep(1:3,each=8),col=rep(c(4,2,3),each=8))
>>>>>>> mtext(c("Region","Sample"),side=1,at=6,line=c(0,1))
>>>>>>> lines(seq(10,45,by=5),my.data$factora,col=4)
>>>>>>> lines(seq(10,45,by=5),my.data$factorb[c(1:5,NA,7,8)],col=2)
>>>>>>> lines(seq(10,45,by=5),my.data$factorc,col=3)
>>>>>>> Jim
>>>>>>> On Wed, Jun 10, 2015 at 10:53 AM, Rosa Oliveira <rosita21 at gmail.com>
>>>>>>> wrote:
>>>>>>> wrote:
>>>>>>>> Dear Don and all,
>>>>>>>> I’ve read the tutorial and tried several codes before posting :)
>>>>>>>> I’m really naive.
>>>>>>>> what I was trying to :  is something like the graph in the picture I
>>>>>>>> drawee.
>>>>>>>> Is it more clear now?
--
____________________________________________________________________________
____________________________________________________________________________
>>>>>>>>> On 09 Jun 2015, at 19:23, Don McKenzie <dmck at u.washington.edu
>>>>>>>>> <mailto:dmck at u.washington.edu>> wrote:
>>>>>>>>> The answer lies in learning to use the help (and knowing where to
>>>>>>>>> start).  Did you look at the tutorial that comes with the R
>>>>>>>>> installation?
>>>>>>>>> ?plot
>>>>>>>>> ?lines
>>>>>>>>> ?par
>>>>>>>>> In the last, look for the descriptions of “col” and “lty”.
>>>>>>>>> Using plot() and lines(), and subsetting the four unique values of
>>>>>>>>> “sample”, you can create your lines.
>>>>>>>>> Here is a crude start, assuming your columns are part of a data frame
>>>>>>>>> called “my.data”.   Untested...
>>>>>> plot(my.data$region[my.data$sample==10],my.data$factora[my.data$sample==10],col=4)
>>>>>>>>> # blue line, not dashed
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>> .
>>>>>> lines(my.data$region[my.data$sample==20],my.data$factorb[my.data$sample==20],col=2,lty=2)
>>>>>>>>> # red dashed line
>>>>>>>>>> On Jun 9, 2015, at 10:36 AM, Rosa Oliveira <rosita21 at gmail.com
>>>>>>>>>> <mailto:rosita21 at gmail.com>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> another naive question (i’m pretty sure :( )
>>>>>>>>>> I’m trying to plot a multiple line graph:
>>>>>>>>>>        region              sample          factora          factorb
>>>>>>>>>> factorc
>>>>>>>>>> 0.1  10      0.895   0.903   0.378
>>>>>>>>>> 0.2  10      0.811   0.865   0.688
>>>>>>>>>> 0.1  20      0.735   0.966   0.611
>>>>>>>>>> 0.2  20      0.777   0.732   0.653
>>>>>>>>>> 0.1  30      0.600   0.778   0.694
>>>>>>>>>> 0.2  30      0.466   174.592 0.461
>>>>>>>>>> 0.1  40      0.446   0.432   0.693
>>>>>>>>>> 0.2  40      0.392   0.294   0.686
>>>>>>>>>> The first column should be the independent variable, the second should
>>>>>>>>>> compute a bold line for sample(10) and dash line for sample 20.
>>>>>>>>> What about the other two values of “sample”?
>>>>>>>>>> The others variables are outcomes for each of the first scenarios, and
>>>>>>>>>> so it should: the 3rd, 4th and 5th columns should be blue, red and
>>>>>>>>>> green respectively.
>>>>>>>>>> Resume :)
>>>>>>>>>> I should have a graph, in the x-axe should have the region and in the
>>>>>>>>>> y axe, the factor.
>>>>>>>>>> Lines:
>>>>>>>>>>     1 - blue and bold for region 0.1, sample 10 and factor a
>>>>>>>>>>     2 - blue and dash for region 0.2, sample 10 and factor a
>>>>>>>>>>     3 - red and bold for region 0.1, sample 10 and factor b
>>>>>>>>>>     4 - red and dash for region 0.2, sample 10 and factor b
>>>>>>>>>>     5 - green and bold for region 0.1, sample 10 and factor c
>>>>>>>>>>     6 - green and dash for region 0.2, sample 10 and factor c
>>>>>>>>> Not consistent with what you said above. These are no longer lines, but
>>>>>>>>> points.
>>>>>>>>>> nonetheless the independent variable is nominal, I should plot a line
>>>>>>>>>> graph.
>>>>>>>>>> Can anyone help me please?
>>>>>>>>>> I have my file as a cvs file, so I first read that file (that I know
>>>>>>>>>> how to do :)).
>>>>>>>>>> But I have it in that format.
RO
--
____________________________________________________________________________
____________________________________________________________________________
