[R] Quickstart #2

Sun Oct 8 07:17:17 CEST 2000

Phil Rhoades asks:

> 
> I have a file of the form:
> 
>       y01 y02 y03 y04 y05  . . . y0n
>       y11 y12 y13 y14 y15  . . . y1n
>       y21 y22 y23 y24 y25  . . . y2n
>       y31 y32 y33 y34 y35  . . . y3n
>       .
>       .
> 
> I want to read in the file and then for each line, I want to be able to:

OK, so you have to read in the whole file first into a matrix.

R> n <- <whatever it is>
R> Y <- matrix(scan("myInput.txt"), ncol = n, byrow = T)

It would be handy to know how big the matrix Y then is.  How best to
do the next steps will depend on this.

> 
> - calculate the variance on the first two results (eg y01 and y02), and 
> then the variance on the first three results (eg y01 - y03) and then the 
> variance on the first four results  (eg y01 - y04) etc

I must say that statistically I am VERY dubious about doing this, (I
could just about understand a moving variance, but not a cumulative
one), but let's just look at it as a computational problem.

Here is the way I would do it, assuming the number of columns in Y was
not too big but letting the number of rows be quite large:

R> B <- contr.helmert(ncol(Y))
R> B <- B/rep(sqrt(diag(crossprod(B))), rep(nrow(B), ncol(B))) 
R> Y1 <- Y %*% B
R> B <- diag(ncol(Y1))
R> B[] <- as.numeric(row(B) <= col(B))/col(B)
R> Y1 <- (Y1^2) %*% B

Y1 then holds the variances.  (Exercise for the reader: Why is it so?)

On the other hand if you had a huge number of columns and relatively
few rows in Y I would probably do it as follows:

R> Y1 <- t(apply(Y - apply(Y, 1, mean), 1, 
         function(x) ((cumsum(x^2) - cumsum(x)^2 / 1:length(x)) /
          (1:length(x) - 1))))[, -1]

(Instead Y - apply(Y, 1, mean) you could just use Y but the numerical
performance may not be as good.)

The tricks are (a) to work with the whole object as much as possible
and (b) to choose your biggest guns for where they will do most
damage.

> - then graph the results so I can see the successive change in the variance 
> for each line

To do this really well you might just about have to re-write Trellis
but here's is the bones of a very simple solution:

R> X11() 
R> par(ask = T, mfcol = c(4,3))  # 12 at a time
R> yr <- range(Y1) # for a common y-axis
R> for(i in 1:nrow(Y1))
R+     plot(Y1[i, ], type = "b", ylim = yr, main = i)

> 
> ie I want the graphs printed on the same page under each other so I can see 
> compare the lines (there will be n-1 points on each graph).
> 

Fitting them all on one page in a way that you can see them is a tall
order if you have hundreds of rows.  This solution above gives you a
dozen at a time, but you could squeeze more in by fiddling with the
margin widths chopping out the axes and changing the mfcol setting.
Over to you...

-- 
Bill Venables,      Statistician,     CMIS Environmetrics Project
CSIRO Marine Labs, PO Box 120, Cleveland, Qld,  AUSTRALIA.   4163
Tel: +61 7 3826 7251           Email: Bill.Venables at cmis.csiro.au    
Fax: +61 7 3826 7304      http://www.cmis.csiro.au/bill.venables/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._