[R] OK - I got the data - now what? :-)

Mark Knecht markknecht at gmail.com
Sun Jul 5 17:11:02 CEST 2009


On Sun, Jul 5, 2009 at 7:35 AM, David Winsemius<dwinsemius at comcast.net> wrote:
>
> On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:
>
>> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<jholtman at gmail.com> wrote:
>>>
>>> See if this example helps; show how to either plot the row or columns
>>> of a data frame:
>>>
>>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>>> test
>>>
>>>          C1        C2        C3
>>> 1  0.91287592 0.3390729 0.4346595
>>> 2  0.29360337 0.8394404 0.7125147
>>> 3  0.45906573 0.3466835 0.3999944
>>> 4  0.33239467 0.3337749 0.3253522
>>> 5  0.65087047 0.4763512 0.7570871
>>> 6  0.25801678 0.8921983 0.2026923
>>> 7  0.47854525 0.8643395 0.7111212
>>> 8  0.76631067 0.3899895 0.1216919
>>> 9  0.08424691 0.7773207 0.2454885
>>> 10 0.87532133 0.9606180 0.1433044
>>>>
>>>> # this will plot each column (C1, C2, C3)
>>>> matplot(test, type='o')
>>>> # plot each row
>>>> matplot(t(test), type='o')
>>>
>>>
>>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<markknecht at gmail.com> wrote:
>>>>
>>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>>> program outside of R this morning to dump a bunch of experimental
>>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>>> but the amount of data in each column varies based on the length of
>>>> the experiment. The real data ends with a 0 following some non-zero
>>>> value. It might be as short as 5 to 10 columns or as many as 390. The
>>>> first 9 columns contain some data about when the experiment was run
>>>> and a few other things I thought I might be interested in later. All
>>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>>> etc., up to C390 The first value for every experiment is some value I
>>>> will normalize and then the values following are above and below the
>>>> original tracing out the path that the experiment took, ending
>>>> somewhere to the right but not a fixed number of readings.
>>>>
>>>> R reads it in fine and it looks good so far.
>>>>
>>>> Now, what I thought I might do with R is plot all 700 rows as
>>>> individual lines, giving them some color based on info in columns 1-9,
>>>> but suddenly I'm lost again in plots which I think should be fairly
>>>> easy. How would I go about creating a plot for even one line, much
>>>> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
>>>> values. I could go back and put one in the data but then I don't think
>>>> that should really be required, or I could go back and make the
>>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>>> thought I read that headers cannot start with numbers.
>>>>
>>>> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
>>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
>>>> I should strip the C from C1 and be left with 1? Maybe the best thing
>>>> is to copy the data for one line to another data.frame or array and
>>>> then plot that?
>>>>
>>>> Just sort of lost looking at help files. Thanks for any ideas you can
>>>> send along. Ask questions if I didn't explain my problem well enough.
>>>> Not looking for anyone to do my work, just trying to get the concepts
>>>> right
>>>>
>>>> Cheers,
>>>> Mark
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>
>> Hey Jim,
>>  Thanks for the pointers on matplot. I suspect that will be useful
>> one of these days.
>>
>>  I'm attaching a little code to make a test case closer to what I
>> have to deal with at the bottom. My problem with your data was that
>> you plot everything. In my data I need to plot only a portion of it,
>> and in the array not every cell is valid - I don't want to plot cells
>> that have 0.00 as a value. In the array 'test' I need to plot the
>> general area defined by C1:C6, each row as a line, but stop plotting
>> each row when I run into a 0. Keep in mind that I don't know what
>> column C1 starts in. It is likely to change over time.
>>
>>  I think the root cause of a number of my coding problems in R right
>> now is my lack of skills in reading and grabbing portions of the data
>> out of arrays. I'm new at this. (And not a programmer) I need to find
>> some good examples to read and test on that subject. If I could locate
>> which column was called C1, then read row 3 from C1 up to the last
>> value before a 0, I'd have proper data to plot for one line. Repeat as
>> necessary through the array and I get all the lines. Doing the lines
>> one at a time should allow me the opportunity to apply color or not
>> plot based on values in the first few columns.
>>
>> Thanks,
>> Mark
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> #Print array
>> test
>
> ?"[" for the help page on Extract which is a gold mine of useful methods
>
> A single row can be extracted with:
> test[3, ]
>
> Two rows:
> test[3:4, ]
>
> And individual elements of a vector can be further specified:
>> test[3,][4:5]
>    C2   C3
> 3 0.66 0.51
>
> You can then access or determine numerical values with logical functions
> such as which:
> which(names(test)=="C1")   # 3  names gives you an ordered listing of column
> names
> which(test[3,] == 0.0)     # 6,7
>
> (Note:  one of the most frequent newbie questions is why some seemingly
> obvious equality expressions are FALSE):
>> sqrt(2)*sqrt(2) == 2
> [1] FALSE
> So if your values are calculated from other values then consider using
> all.equal()
>
> And repeated applications of the testing criteria process are effective:
>
> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>    C1   C2   C3
> 3 0.52 0.66 0.51
>
> (and a warning that does not seem accurate to me.)
>
> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>  numerical expression has 3 elements: only the first used
>
> Seems to me that all of the element were used. I cannot explain that warning
> but am pretty sure it can be ignored.
>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

Really GREAT examples. Giving me lots of ideas. In fact with a little
study it seemed to help me solve your warning message. Since the
expression

which(test[3,0]==0)

returns a list of integer values, I was able to choose only the first
of those values with [1] and the warning disappears:

> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
    C1   C2  C3
3 0.01 0.37 0.4
Warning message:
In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
  numerical expression has 3 elements: only the first used


> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)[1]-1)]
    C1   C2  C3
3 0.01 0.37 0.4
>

LOTS more study to do but I think this helps me move forward.

Thanks!

- Mark




More information about the R-help mailing list