[R] How to calculate variance on multiple numbers at once?

John Kane jrkrideau at inbox.com
Fri Nov 13 01:04:25 CET 2015


I still don't understand what is happening, partly because the code has problems and partly because I don't understand why you are trying to take the variance of a single number.

The first problem, albeit a relatively minor one is that your data.frame has a variable that is not in the vectors you provide. 

my.data <- data.frame(well, clock, targ, flu, stringsAsFactors = FALSE)

There is no "well" in the vectors. Substituting "samp" results in a data.frame that corresponds to the rest of your code. This is one reason why we recommend using dput(). See ?dput for information.

I do not know what is happening with the loops : They run and output something but if you look at the code “clock <= i” cannot be subsetting properly in either case.

Also if you look at “sub” in the # variance with individual numbers, 

sub <- subset(my.data, samp == 1) 

it only has one instance for each value of clock and  var(X$flu) should be returning NA as one cannot take the variance of a single number.
So, as far as I can see you cannot trust any of the numbers in either set of calculations.

Your code 
# variance with individual numbers. 
sub <- subset(my.data, samp == 1)
for (i in 2:nrow(sub)) {
  X <- subset(sub, clock <= i)
  V <- var(X\$flu)
  cat("variance at clock ", i, " = ", V, "\n", sep="")
}

I tried :
library(plyr)
sub <- subset(my.data, samp == 1)
ddply(sub, .(clock), summarize, variance  = var(flu))
and got the expected 
clock variance
1     1       NA
2     2       NA
3     3       NA
4     4       NA
5     5       NA

Assuming I understand what the "# variance with multiple numbers" loop is doing (and I am by no means sure I do) I get

subtarg <- subset(my.data, targ == 'A')
 ddply(subtarg, .(clock), summarize, variance  = var(flu))
 clock     variance
1     1 2.533333e-05
2     2 2.233333e-05
3     3 2.033333e-05
4     4 4.333333e-06
5     5 3.000000e-06

John Kane
Kingston ON Canada


> -----Original Message-----
> From: marongiu.luigi at gmail.com
> Sent: Thu, 12 Nov 2015 14:19:58 +0000
> To: jrkrideau at inbox.com, r-help at r-project.org
> Subject: Re: [R] How to calculate variance on multiple numbers at once?
> 
> Essentially the problem is related to this problem. I am measuring the
> fluorescence (flu) generated over time (clock) from 5 samples (samp)
> which are grouped by two targets (targ).
> I can calculate the variance of the fluorescence for each sample at a
> given time (in this example, for sample 1); but if I consider a target
> at the time there would be multiple readings at once to consider.
> would the variance formula still be OK to use.
> In this example:
>>>> 
> samp <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
> 5, 5, 5, 5, 5)
> clock <- rep(1:5,5)
> targ <- c(rep("A", 5), rep("B", 5), rep("A", 5), rep("A", 5), rep("B",
> 5))
> flu <- c(-0.012, -0.01, -0.008, -0.002, -0.001, -0.007, -0.008,
> -0.009, -0.009, -0.012, -0.002, -0.003, -0.003, 0.001, 0.002, -0.006,
> -0.001, 0.001, 0.002, 0.002, -0.002, -0.003, -0.003, -0.002, -0.001)
> my.data <- data.frame(well, clock, targ, flu, stringsAsFactors = FALSE)
> 
> # variance with individual numbers
> sub <- subset(my.data, samp == 1)
> plot(sub$flu ~ sub$clock)
> abline(lm(sub$flu ~ sub$clock))
> for (i in 2:nrow(sub)) {
>   X <- subset(sub, clock <= i)
>   V <- var(X$flu)
>   cat("variance at clock ", i, " = ", V, "\n", sep="")
> }
> # variance with multiple numbers
> sub <- subset(my.data, targ == 'A')
> plot(sub$flu ~ sub$clock)
> abline(lm(sub$flu ~ sub$clock))
> for (i in 2:max(sub$clock)) {
>   X <- subset(sub, clock <= i)
>   V <- var(X$flu)
>   cat("variance at clock ", i, " = ", V, "\n", sep="")
> }
> 
> the results for the individual numbers are:
> variance at clock 2 = 2e-06
> variance at clock 3 = 4e-06
> variance at clock 4 = 1.866667e-05
> variance at clock 5 = 2.38e-05
> 
> while the results for the multiple numbers are:
> variance at clock 2 = 2.026667e-05
> variance at clock 3 = 1.911111e-05
> variance at clock 4 = 2.026515e-05
> variance at clock 5 = 1.995238e-05
> 
> shall I accept these latter values?
> Thanks
> 
> On Thu, Nov 12, 2015 at 10:59 AM, John Kane <jrkrideau at inbox.com> wrote:
>> I still don't understand what you are doing. I think we need to see your
>> actual data and the code your are using.  If S.Ellison's post has not
>> shown up it is below my signature.
>> 
>> Have a look at
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> and/or http://adv-r.had.co.nz/Reproducibility.html for suggestions on
>> how to pose a question here. In particular data should be supplied in
>> dput() format as it gives us a copy of exactly how the data is formatted
>> on your machine.
>> 
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> ## ===========================================
>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi
>>> Marongiu
>>> if I have a sample set of the following numbers x1=0.09, x2=0.94,
>>> x3=0.48,
>>> x4=0.74, x5=0.04 I can calculate the variance easily.
>> Not without concatenating them into a vector, you can't. You need them
>> in a vector, as in
>> var( c(x1, x2, x3, x4, x5) )
>> 
>>> But if each x is actually a subset of multiple values, what would be
>>> the formula
>>> to calculate the variance? and it is possible to implement such
>>> mathematical
>>> function in R?
>> This is what R wants anyway, so the function you are looking for is
>> var()
>> 
>>> For instance if I have the following: x1=(0.77, 0.22, 0.44), x2=(0.26,
>>> 0.89, 0.58),
>>> x3=(0.20, 0.25, 0.91), x4=(0.06, 0.13, 0.26) and x5=(0.65, 0.16, 0.72)
>>> how can i
>>> calculate the variance for each x?
>> var(x1)
>> var(x2)
>> ....
>> 
>> or, if you want to be a bit more slick about it and do it in one line
>> 
>> lapply(list( x1, x2, x3, ...), var  )
>> 
>> (or sapply() if you want a vector result)
>> 
>> ## ===========================================
>> 
>> 
>>> -----Original Message-----
>>> From: marongiu.luigi at gmail.com
>>> Sent: Wed, 11 Nov 2015 23:20:26 +0000
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] How to calculate variance on multiple numbers at once?
>>> 
>>> Thank you for the reply. For clarification, let's say that I can
>>> calculate the sum of variances of the individual x numbers in five
>>> consecutive steps (although of course there are better
>>> implementations) where each step the sum is incremented by (x -
>>> mean)^2.
>>> In the case I am handling, at each step i have to consider 3 values at
>>> once and I don't know how to relate them neither with the mathematical
>>> formula nor with the R implementation.
>>> Besides I haven't seen Ellison's answer...
>>> Best regards
>>> L
>>> 
>>> On Wed, Nov 11, 2015 at 1:04 PM, John Kane <jrkrideau at inbox.com> wrote:
>>>> I really don't understand what you are looking for but if S. Ellison's
>>>> answer is not what you want what about this where your various x
>>>> vectors
>>>> are in a data frame
>>>> 
>>>> ibrary(reshape2)
>>>> library(plyr)
>>>> 
>>>> dat1  <-  structure(list(x1 = c(0.77, 0.22, 0.44), x2 = c(0.26, 0.89,
>>>> 0.58
>>>> ), x3 = c(0.2, 0.25, 0.91), x4 = c(0.06, 0.13, 0.26), x5 = c(0.65,
>>>> 0.16, 0.72)), .Names = c("x1", "x2", "x3", "x4", "x5"), row.names =
>>>> c(NA,
>>>> -3L), class = "data.frame")
>>>> 
>>>> m1  <-   melt(dat1)
>>>> 
>>>> ddply(m1, .(variable), summarize, variance = var(value))
>>>> 
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: marongiu.luigi at gmail.com
>>>>> Sent: Wed, 11 Nov 2015 11:26:25 +0000
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] How to calculate variance on multiple numbers at once?
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> if I have a sample set of the following numbers x1=0.09, x2=0.94,
>>>>> x3=0.48, x4=0.74, x5=0.04 I can calculate the variance easily.
>>>>> But if each x is actually a subset of multiple values, what would be
>>>>> the formula to calculate the variance? and it is possible to
>>>>> implement
>>>>> such mathematical function in R?
>>>>> 
>>>>> For instance if I have the following: x1=(0.77, 0.22, 0.44),
>>>>> x2=(0.26,
>>>>> 0.89, 0.58), x3=(0.20, 0.25, 0.91), x4=(0.06, 0.13, 0.26) and
>>>>> x5=(0.65, 0.16, 0.72) how can i calculate the variance for each x?
>>>>> 
>>>>> Thank you
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> ____________________________________________________________
>>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas
>>>> on
>>>> your desktop!
>>>> Check it out at http://www.inbox.com/marineaquarium
>>>> 
>>>> 
>> 
>> ____________________________________________________________
>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>> your desktop!
>> Check it out at http://www.inbox.com/marineaquarium
>> 
>>

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!



More information about the R-help mailing list