[R] OK - I got the data - now what? :-)

David Winsemius dwinsemius at comcast.net
Sun Jul 5 16:35:42 CEST 2009


On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:

> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<jholtman at gmail.com> wrote:
>> See if this example helps; show how to either plot the row or columns
>> of a data frame:
>>
>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>> test
>>           C1        C2        C3
>> 1  0.91287592 0.3390729 0.4346595
>> 2  0.29360337 0.8394404 0.7125147
>> 3  0.45906573 0.3466835 0.3999944
>> 4  0.33239467 0.3337749 0.3253522
>> 5  0.65087047 0.4763512 0.7570871
>> 6  0.25801678 0.8921983 0.2026923
>> 7  0.47854525 0.8643395 0.7111212
>> 8  0.76631067 0.3899895 0.1216919
>> 9  0.08424691 0.7773207 0.2454885
>> 10 0.87532133 0.9606180 0.1433044
>>> # this will plot each column (C1, C2, C3)
>>> matplot(test, type='o')
>>> # plot each row
>>> matplot(t(test), type='o')
>>
>>
>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<markknecht at gmail.com>  
>> wrote:
>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>> program outside of R this morning to dump a bunch of experimental
>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>> but the amount of data in each column varies based on the length of
>>> the experiment. The real data ends with a 0 following some non-zero
>>> value. It might be as short as 5 to 10 columns or as many as 390.  
>>> The
>>> first 9 columns contain some data about when the experiment was run
>>> and a few other things I thought I might be interested in later. All
>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>> etc., up to C390 The first value for every experiment is some  
>>> value I
>>> will normalize and then the values following are above and below the
>>> original tracing out the path that the experiment took, ending
>>> somewhere to the right but not a fixed number of readings.
>>>
>>> R reads it in fine and it looks good so far.
>>>
>>> Now, what I thought I might do with R is plot all 700 rows as
>>> individual lines, giving them some color based on info in columns  
>>> 1-9,
>>> but suddenly I'm lost again in plots which I think should be fairly
>>> easy. How would I go about creating a plot for even one line, much
>>> less all of them? I don't have a row with 1,2,3,4 to us as the X  
>>> axis
>>> values. I could go back and put one in the data but then I don't  
>>> think
>>> that should really be required, or I could go back and make the
>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>> thought I read that headers cannot start with numbers.
>>>
>>> Maybe the X axis values for a plot can actually be non-numeric C1,  
>>> C2,
>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or  
>>> maybe
>>> I should strip the C from C1 and be left with 1? Maybe the best  
>>> thing
>>> is to copy the data for one line to another data.frame or array and
>>> then plot that?
>>>
>>> Just sort of lost looking at help files. Thanks for any ideas you  
>>> can
>>> send along. Ask questions if I didn't explain my problem well  
>>> enough.
>>> Not looking for anyone to do my work, just trying to get the  
>>> concepts
>>> right
>>>
>>> Cheers,
>>> Mark
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>
> Hey Jim,
>   Thanks for the pointers on matplot. I suspect that will be useful
> one of these days.
>
>   I'm attaching a little code to make a test case closer to what I
> have to deal with at the bottom. My problem with your data was that
> you plot everything. In my data I need to plot only a portion of it,
> and in the array not every cell is valid - I don't want to plot cells
> that have 0.00 as a value. In the array 'test' I need to plot the
> general area defined by C1:C6, each row as a line, but stop plotting
> each row when I run into a 0. Keep in mind that I don't know what
> column C1 starts in. It is likely to change over time.
>
>   I think the root cause of a number of my coding problems in R right
> now is my lack of skills in reading and grabbing portions of the data
> out of arrays. I'm new at this. (And not a programmer) I need to find
> some good examples to read and test on that subject. If I could locate
> which column was called C1, then read row 3 from C1 up to the last
> value before a 0, I'd have proper data to plot for one line. Repeat as
> necessary through the array and I get all the lines. Doing the lines
> one at a time should allow me the opportunity to apply color or not
> plot based on values in the first few columns.
>
> Thanks,
> Mark
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> #Print array
> test

?"[" for the help page on Extract which is a gold mine of useful methods

A single row can be extracted with:
test[3, ]

Two rows:
test[3:4, ]

And individual elements of a vector can be further specified:
 > test[3,][4:5]
     C2   C3
3 0.66 0.51

You can then access or determine numerical values with logical  
functions such as which:
which(names(test)=="C1")   # 3  names gives you an ordered listing of  
column names
which(test[3,] == 0.0)     # 6,7

(Note:  one of the most frequent newbie questions is why some  
seemingly obvious equality expressions are FALSE):
 > sqrt(2)*sqrt(2) == 2
[1] FALSE
So if your values are calculated from other values then consider using  
all.equal()

And repeated applications of the testing criteria process are effective:

test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
     C1   C2   C3
3 0.52 0.66 0.51

(and a warning that does not seem accurate to me.)

In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
   numerical expression has 3 elements: only the first used

Seems to me that all of the element were used. I cannot explain that  
warning but am pretty sure it can be ignored.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list