[BioC] problem normalizeMedians (limma)

Marcus Davy mdavy at hortresearch.co.nz
Wed Oct 19 02:49:33 CEST 2005


Hi,
Doe this include modifying normalizeMedianDeviations aswell for consistency.
I suspect normalizeMedianDeviations doesn't reach R's limit as quickly.

Marcus


MA <- new("RGList")

n     <- 200
genes <- 1000

MA$M <- matrix(runif(n*genes, -8, 8), nr=genes, nc=n)
MA$A <- matrix(runif(n*genes, 1,16), nr=genes, nc=n)

summary(c(MA$M))
summary(c(MA$A))

any(is.infinite(normalizeMedians(MA$A)))

normalizeMedians2 <- function(x)
{
  narrays <- NCOL(x)
  if (narrays == 1)
    return(x)
  a.med <- log(apply(x, 2, median, na.rm = TRUE))
  a.med <- exp(a.med - mean(a.med))
  t(t(x)/a.med)
}

# Subset
normalizeMedians(MA$A[1:5,1:5])
normalizeMedians2(MA$A[1:5,1:5])

# Entire dataset
any(is.infinite(normalizeMedians(MA$A)))
any(is.infinite(normalizeMedians2(MA$A)))

normalizeMedianDeviations2 <- function(x)
{
  narrays <- NCOL(x)
  if (narrays == 1)
    return(x)
  medabs <- function(x) median(abs(as.numeric(x[!is.na(x)])))
  xmat.mav <- log(apply(x, 2, medabs))
  denom <- mean(xmat.mav)
  si <- exp(xmat.mav - denom)
  t(t(x)/si)
}

normalizeMedianDeviations(MA$M[1:5, 1:5])
normalizeMedianDeviations2(MA$M[1:5, 1:5])

# Entire dataset
any(is.infinite(normalizeMedianDeviations(MA$M)))
any(is.infinite(normalizeMedianDeviations2(MA$M)))



On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at wehi.edu.au> wrote:

> 
>>> Date: Mon, 17 Oct 2005 18:24:47 -0400
>>> From: Francois Pepin <fpepin at cs.mcgill.ca>
>>> Subject: [BioC] problem normalizeMedians (limma)
>>> To: bioconductor at stat.math.ethz.ch
>>> 
>>> Hi everyone,
>>> 
>>> I'm not sure what should be done about it, but I've been surprised with
>>> the following behavior with normalizeBetweenArrays:
>>> 
>>> (dat is an MAList here)
>>>> dat$A[1,1]
>>> [1] 8.796361
>>>> dat <- normalizeBetweenArrays(dat)
>>>> dat$A[1,1]
>>> [1] Inf
>>> 
>>> This is due to the fact that I've got almost 400 arrays here. The
>>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the
>>> following behavior:
>>> 
>>>     a.med <- apply(x, 2, median, na.rm = TRUE)
>>>     a.med <- a.med/(prod(a.med))^(1/narrays)
>>> 
>>> multiplying 400 relatively small numbers can easily reach R's limit as
>>> far as doubles are concerned.
>>> 
>>>> 6^400
>>> [1] Inf
>>> 
>>> Wouldn't it be better to be working in log space instead?
> 
> Yes, you're right. In limma 2.3.2 I will change
> 
>     (prod(a.med))^(1/narrays)
> 
> to
> 
>     exp(mean(log(a.med)))
> 
> which should make floating underflow very unlikely.
> 
> BTW, the "scale" method has been the default for normalizeBetweenArrays()
> for historic reasons, inherited from the older sma package. I'm going to
> change the default to "Aquantile".
> 
> Gordon
> 
>>> Francois
>>> 
>>>> sessionInfo()
>>> R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu
>>> 
>>> attached base packages:
>>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
>>> "datasets"
>>> [7] "base"
>>> 
>>> other attached packages:
>>>   limma
>>> "1.9.6"
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


______________________________________________________

The contents of this e-mail are privileged and/or confidenti...{{dropped}}



More information about the Bioconductor mailing list