[R] OK - I got the data - now what? :-)

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Jul 5 16:50:16 CEST 2009



David Winsemius wrote:
> 
> On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:
> 
>> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<jholtman at gmail.com> wrote:
>>> See if this example helps; show how to either plot the row or columns
>>> of a data frame:
>>>
>>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>>> test
>>>           C1        C2        C3
>>> 1  0.91287592 0.3390729 0.4346595
>>> 2  0.29360337 0.8394404 0.7125147
>>> 3  0.45906573 0.3466835 0.3999944
>>> 4  0.33239467 0.3337749 0.3253522
>>> 5  0.65087047 0.4763512 0.7570871
>>> 6  0.25801678 0.8921983 0.2026923
>>> 7  0.47854525 0.8643395 0.7111212
>>> 8  0.76631067 0.3899895 0.1216919
>>> 9  0.08424691 0.7773207 0.2454885
>>> 10 0.87532133 0.9606180 0.1433044
>>>> # this will plot each column (C1, C2, C3)
>>>> matplot(test, type='o')
>>>> # plot each row
>>>> matplot(t(test), type='o')
>>>
>>>
>>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<markknecht at gmail.com> wrote:
>>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>>> program outside of R this morning to dump a bunch of experimental
>>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>>> but the amount of data in each column varies based on the length of
>>>> the experiment. The real data ends with a 0 following some non-zero
>>>> value. It might be as short as 5 to 10 columns or as many as 390. The
>>>> first 9 columns contain some data about when the experiment was run
>>>> and a few other things I thought I might be interested in later. All
>>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>>> etc., up to C390 The first value for every experiment is some value I
>>>> will normalize and then the values following are above and below the
>>>> original tracing out the path that the experiment took, ending
>>>> somewhere to the right but not a fixed number of readings.
>>>>
>>>> R reads it in fine and it looks good so far.
>>>>
>>>> Now, what I thought I might do with R is plot all 700 rows as
>>>> individual lines, giving them some color based on info in columns 1-9,
>>>> but suddenly I'm lost again in plots which I think should be fairly
>>>> easy. How would I go about creating a plot for even one line, much
>>>> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
>>>> values. I could go back and put one in the data but then I don't think
>>>> that should really be required, or I could go back and make the
>>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>>> thought I read that headers cannot start with numbers.
>>>>
>>>> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
>>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
>>>> I should strip the C from C1 and be left with 1? Maybe the best thing
>>>> is to copy the data for one line to another data.frame or array and
>>>> then plot that?
>>>>
>>>> Just sort of lost looking at help files. Thanks for any ideas you can
>>>> send along. Ask questions if I didn't explain my problem well enough.
>>>> Not looking for anyone to do my work, just trying to get the concepts
>>>> right
>>>>
>>>> Cheers,
>>>> Mark
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>
>> Hey Jim,
>>   Thanks for the pointers on matplot. I suspect that will be useful
>> one of these days.
>>
>>   I'm attaching a little code to make a test case closer to what I
>> have to deal with at the bottom. My problem with your data was that
>> you plot everything. In my data I need to plot only a portion of it,
>> and in the array not every cell is valid - I don't want to plot cells
>> that have 0.00 as a value. In the array 'test' I need to plot the
>> general area defined by C1:C6, each row as a line, but stop plotting
>> each row when I run into a 0. Keep in mind that I don't know what
>> column C1 starts in. It is likely to change over time.
>>
>>   I think the root cause of a number of my coding problems in R right
>> now is my lack of skills in reading and grabbing portions of the data
>> out of arrays. I'm new at this. (And not a programmer) I need to find
>> some good examples to read and test on that subject. If I could locate
>> which column was called C1, then read row 3 from C1 up to the last
>> value before a 0, I'd have proper data to plot for one line. Repeat as
>> necessary through the array and I get all the lines. Doing the lines
>> one at a time should allow me the opportunity to apply color or not
>> plot based on values in the first few columns.
>>
>> Thanks,
>> Mark
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> #Print array
>> test
> 
> ?"[" for the help page on Extract which is a gold mine of useful methods
> 
> A single row can be extracted with:
> test[3, ]
> 
> Two rows:
> test[3:4, ]
> 
> And individual elements of a vector can be further specified:
>  > test[3,][4:5]
>     C2   C3
> 3 0.66 0.51
> 
> You can then access or determine numerical values with logical functions 
> such as which:
> which(names(test)=="C1")   # 3  names gives you an ordered listing of 
> column names
> which(test[3,] == 0.0)     # 6,7
> 
> (Note:  one of the most frequent newbie questions is why some seemingly 
> obvious equality expressions are FALSE):
>  > sqrt(2)*sqrt(2) == 2
> [1] FALSE
> So if your values are calculated from other values then consider using 
> all.equal()
> 
> And repeated applications of the testing criteria process are effective:
> 
> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>     C1   C2   C3
> 3 0.52 0.66 0.51
> 
> (and a warning that does not seem accurate to me.)
> 
> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>   numerical expression has 3 elements: only the first used


David,

# which(test[3,] == 0.0)
[1] 6 7 8

and in a:b a and b must be length 1 vectors (scalars) otherwise just the 
first element (in this case 6) is used.

That leads us to the conclusion that writing the line above is not 
really the cleanest way or you intended something different ....

Best,
Uwe



> Seems to me that all of the element were used. I cannot explain that 
> warning but am pretty sure it can be ignored.
> 
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list