[R] A comment about R:

Fri Jan 6 18:26:06 CET 2006

Hi

just to difference between matrix and data.frame

 > str(data.frame(mat))
`data.frame':   4 obs. of  5 variables:
 $ X1: num  -0.1940 -0.7629  0.0446 -0.5408
 $ X2: num  -1.092 -0.040  1.070  0.868
 $ X3: num  0.634 0.823 0.693 1.152
 $ X4: num   0.0258 -1.6507  1.2052  0.9714
 $ X5: num   0.673  0.380 -1.531 -0.426

> str((mat))
 num [1:4, 1:5] -0.1940 -0.7629  0.0446 -0.5408 -1.0925 ...

matrix is a numeric vector with dim attributes, data frame is matrix 
like structure which can hold different types of variables (columns).

sd is function based on var

> sd
function (x, na.rm = FALSE) 
{
    if (is.matrix(x)) 
        apply(x, 2, sd, na.rm = na.rm)
    else if (is.vector(x)) 
        sqrt(var(x, na.rm = na.rm))
    else if (is.data.frame(x)) 
        sapply(x, sd, na.rm = na.rm)
    else sqrt(var(as.vector(x), na.rm = na.rm))
}
<environment: namespace:stats>

and therefore behaves in similar manner for data.frames and matrices,
but mean accepts only data.frames, numeric vectors and dates

Arguments:

       x: An R object.  Currently there are methods for numeric data
          frames, numeric vectors and dates.  A complex vector is
          allowed for 'trim = 0', only.

So therefore matrix is treated as a numeric vector by mean but as a 
set of vectors by sd.

Don't know why.
I believe that it is because with var(matrix) you expect output as a 
variance matrix.

Maybe somebody can explain it better.

If you wanted similar behaviour for mean for matrices as sd you can 
try

mymean<-function(x, na.rm=FALSE)
{
if(is.matrix(x))
colMeans(x, na.rm=na.rm)
else mean(x, na.rm=na.rm)
}

> mymean(mat)
[1] -0.3632682  0.2013843  0.8251625  0.1379205 -0.2259909

HTH
Petr

On 6 Jan 2006 at 16:18, Stefan Eichenberger wrote:

From:           	"Stefan Eichenberger" <Stefan.Eichenberger at se-kleve.com>
To:             	<r-help at stat.math.ethz.ch>
Date sent:      	Fri, 6 Jan 2006 16:18:16 +0100
Subject:        	[R]   A comment about R:

> ~~~~~~~~~~~~~~~
> ... blame me for not having sent below message initially in
> plain text format. Sorry!
> ~~~~~~~~~~~~~~~
> 
> I just got into R for most of the Xmas vacations and was about to ask
> for helping  pointer on how to get a hold of R when I came across this
> thread. I've read through  most it and would like to comment from a
> novice user point of view. I've a strong  programming background but
> limited statistical experience and no knowledge on  competing
> packages. I'm working as a senior engineer in electronics.
> 
> Yes, the learning curve is steep. Most of the docu is extremely terse.
> Learning is mostly from examples (a wiki was proposed in another
> mail...), documentation uses no graphical elements at all. So, when it
> comes to things like xyplot in lattice: where would I get the concepts
> behind panels, superpanels, and the like?
> 
> ok., this is steep and terse, but after a while I'll get over it...
> That's life. The general concept is great, things can be expressed
> very densly: Potential  is here.... I quickly had 200 lines of my own
> code together, doing what it should -  or so I believed.
> 
> Next I did:
>   matrix<-matrix(1:100, 10, 10)
>   image(matrix)
>   locator()
> Great: I can interactively work with my graphs... But then:
>   filled.contour(matrix)
>   locator()
> Oops - wrong coordinates returned. Bug. Apparently, locator() doen't
> realize that fitted.contour() has a color bar to the right and scales
> x wrongly...
> 
> Here is what really shocked me:
> 
> > str(bar) `data.frame':   206858 obs. of  12 variables:  ...
> > str(mean(bar[,6:12]))
>   Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
>   ...
> > str(sd(bar[,6:12]))
>   Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
>   ...
> > prcomp(bar[,6:12])->foo
> > str(foo$x)
>   num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ... ...
> > str(mean(foo$x))
>   num -1.07e-13
> > str(sd(foo$x))
>   Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
>   ...
> 
> So, sd returns a vector independent on whether the arguement is a
> matrix or data.frame, but mean reacts differently and returns a vector
> only against a data.frame?
> 
> The problem here is not that this is difficult to learn - the problem
> is the complete absense of a concept. Is a data.frame an 'extended'
> matrix with columns of different types or  something different? Since
> the numeric mean (I expected a vector) is recycled nicely  when used
> in a vector context, this makes debugging code close to impossible.
> Since  sd returns a vector, things like mean + 4*sd vary sufficiently
> across the data elements that I assume working code... I don't get any
> warning signal that something is wrong here.
> 
> The point in case is the behavior of locator() on a filled.contour()
> object: Things apparently  have been programmed and debugged from
> example rather than concept.
> 
> Now, in another posting I read that all this is a feature to discourge
> inexperienced users from statistics and force you to think before you
> do things. Whilst I support this concept of thinking: Did I miss
> something in statistics? I was in the believe that mean and sd were
> relatively close to each other conceptually... (here, they are even in
> different packages...)
> 
> I will continue using R for the time being. But whether I can
> recommend it to my work  collegues remains to be seen: How could I
> ever trust results returned?
> 
> I'm still impressed by some of the efficiency, but my trust is deeply
> shaken...
> ----------------------------------------------------------------------
> - Stefan Eichenberger        mailto:Stefan.Eichenberger at se-kleve.com
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz